-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docs] Process Parquet files #987
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice ! This is super useful :)
Just two comments on the section title:
- Do you think we can make it clearer in the title that you can query any dataset ? IMO the current title "Process Parquet files" may suggest that it can only be useful if you have parquet files already. Maybe mention that the parquet files are "published", or use a verb that suggest that the data is remote e.g. "access" in addition to "process".
- (nit) Both polars and duckdb examples run "queries" on parquet files. Maybe say "query" instead of "process" ? I think "process" can be mentioned for data manipulation that involve user defined functions (UDF) - with Spark for example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
IMO it makes sense to remove the read_polars
/read_pandas
examples from the List Parquet files
and instead provide a link to this guide now that we have it.
And maybe we can mention at the end of the guide that Parquet files can also be queried in the browser with DuckDB WASM.
PS: doc-builder generates (empty) CURL version of the examples, would be nice if this can be removed
Oh! I didn't think the doc-builder would generate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving but let's see if we can fix the issue with the cURL version of the examples being displayed before merging!
@mariosasko, empty cURL example fixed in huggingface/doc-builder#365! |
This PR adds a guide for how to process Parquet files:
pd/pl.read_parquet
pl.scan_parquet