-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add polars compatibility #6531
Add polars compatibility #6531
Conversation
…ts into add-polars-compatibility
…ts into add-polars-compatibility
…ts into add-polars-compatibility
Hi ! thanks for adding polars support :) You added from_polars in arrow_dataset.py but not to_polars, is this on purpose ? Also no need to touch table.py imo, which is for arrow-only logic (tables are just wrappers of pyarrow.Table with the exact same methods + optimization to existing methods + separation between in-memory and memory-mapped) |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
…ts into add-polars-compatibility
Hi @lhoestq, thanks for pointing out the missing I see your point about I also added tests in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks ! Can you addd polars to the test dependencies in setup.py ? This way your tests will be run in the CI
I also added a few more comments:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should fix the CI :)
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
…ts into add-polars-compatibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah our beloved Windows doesn't seem to be properly handled, I added suggestions ti try to fix the Windows CI:
duckdb index files were deleted yesterday in dataset_with_script@ref/convert/parquet so I changed the hash to reflect the new SHA. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great ! Merging now, congrats ! 🚀
Show benchmarksPyArrow==8.0.0 Show updated benchmarks!Benchmark: benchmark_array_xd.json
Benchmark: benchmark_getitem_100B.json
Benchmark: benchmark_indices_mapping.json
Benchmark: benchmark_iterating.json
Benchmark: benchmark_map_filter.json
Show updated benchmarks!Benchmark: benchmark_array_xd.json
Benchmark: benchmark_getitem_100B.json
Benchmark: benchmark_indices_mapping.json
Benchmark: benchmark_iterating.json
Benchmark: benchmark_map_filter.json
|
I'm so excited I tweeted about it: https://x.com/qlhoest/status/1766135995513082086?s=20 I hope it's fine ! |
Thanks @lhoestq for the support and totally fine with the share! Happy to see people excited for this 😃 |
Hey there,
I've just finished adding support to convert and format to
polars.DataFrame
. This was in response to the open issue about integrating Polars #3334. Datasets can be switched to Polars format viaDataset.set_format("polars")
. I've also includedto_polars
andfrom_polars
. All polars functions are checked via config.POLARS_AVAILABLE.A few notes:
This only supports
DataFrames
and notLazyFrames
. This probably could be integrated fairly easily viais_lazy
args inset_format
, andto_polars
.Let me know your feedbacks.