[docs] Pandas to Polars #978

stevhliu · 2023-03-22T23:53:58Z

Sorry for the wait! This PR updates the current code examples in the Parquet docs to use Polars instead of Pandas. It also switches out the alexandriainst/danish-wit with the amazon_polarity dataset because it returned an error saying conversion is limited to datasets under 5GB.

I'll follow this up with another PR for the new Parquet guide (querying/use in web apps with duckdb) 🙂

HuggingFaceDocBuilderDev · 2023-03-22T23:57:08Z

The documentation is not available anymore as the PR was closed or merged.

lhoestq

Oh nice ! Do you think we should keep an example with pandas somewhere ? It can also be useful

mariosasko

Good job!

Yes, let's also have an example with {polars/pandas}.read_parquet to show to get a standard DataFrame from the parquet version of a dataset. For instance, we could explain that scan_parquet (lazily) reads a parquet file without loading all of its contents into RAM and, as such, can inspect large Parquet files while keeping memory usage as low as possible, but read_parquet should give better performance for multiple (uncorrelated) queries if RAM is not an issue.

stevhliu · 2023-03-23T17:00:15Z

Yes, let's also have an example with {polars/pandas}.read_parquet to show to get a standard DataFrame from the parquet version of a dataset

Cool! Should we have that info here or in the new Parquet guide? It might be better in the new guide since this one is about listing the files, and I think it's better not to stray too much into explaining the different ways and pros/cons of reading Parquet files. I can add a <Tip> here with a link to the new guide so users can still easily find this info.

mariosasko · 2023-03-23T19:46:17Z

Feel free to split the guide into two guides, but I think scan_parquet and read_parquet should be in the same guide (they both process Parquet files)

mariosasko

LGTM, thanks!

lhoestq

Cool, and splitting sounds good as well :)

severo · 2023-03-28T21:37:34Z

Thanks!

A detail: by merging this PR before #987, we have a broken link at https://github.com/huggingface/datasets-server/pull/978/files#diff-92a1916282fa4dd583217985f2e4bfae937a001fba6549f47bd9396b74dc8be3R160.

stevhliu · 2023-03-28T21:43:13Z

Oops sorry! Maybe we can remove or hide the link until #987 is merged?

severo · 2023-03-29T08:00:01Z

No, no worry. I don't think the traffic to the docs is such yet that we need to do it. Let's just wait until #987 is merged

pandas to polars

87d4ec2

stevhliu requested review from lhoestq and mariosasko March 22, 2023 23:53

lhoestq reviewed Mar 23, 2023

View reviewed changes

mariosasko reviewed Mar 23, 2023

View reviewed changes

remove code examples for reading files

11aabbd

mariosasko approved these changes Mar 28, 2023

View reviewed changes

lhoestq approved these changes Mar 28, 2023

View reviewed changes

light edits

466b042

stevhliu merged commit 3a98ba6 into huggingface:main Mar 28, 2023

stevhliu deleted the update-parquet-example branch March 28, 2023 17:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[docs] Pandas to Polars #978

[docs] Pandas to Polars #978

stevhliu commented Mar 22, 2023

HuggingFaceDocBuilderDev commented Mar 22, 2023 •

edited

Loading

lhoestq left a comment

mariosasko left a comment

stevhliu commented Mar 23, 2023

mariosasko commented Mar 23, 2023

mariosasko left a comment

lhoestq left a comment

severo commented Mar 28, 2023

stevhliu commented Mar 28, 2023

severo commented Mar 29, 2023

[docs] Pandas to Polars #978

[docs] Pandas to Polars #978

Conversation

stevhliu commented Mar 22, 2023

HuggingFaceDocBuilderDev commented Mar 22, 2023 • edited Loading

lhoestq left a comment

Choose a reason for hiding this comment

mariosasko left a comment

Choose a reason for hiding this comment

stevhliu commented Mar 23, 2023

mariosasko commented Mar 23, 2023

mariosasko left a comment

Choose a reason for hiding this comment

lhoestq left a comment

Choose a reason for hiding this comment

severo commented Mar 28, 2023

stevhliu commented Mar 28, 2023

severo commented Mar 29, 2023

HuggingFaceDocBuilderDev commented Mar 22, 2023 •

edited

Loading