Performance: Calculation 200x+ slower without rechunking or sorting first #17637
Closed
2 tasks done
Labels
accepted
Ready for implementation
bug
Something isn't working
needs triage
Awaiting prioritization by a maintainer
python
Related to Python Polars
Checks
Reproducible example
Log output
No response
Issue description
I created a simple sales_df with polars that I saved as parquet.
When I read the data back in and do a simple window calculation the performance is super slow (~70 sec).
However if I
rechunk()
or.sort("created_on", "customer_id")
first the performance is ~200x++ faster (~0.3 sec)
possibly related: #17562
Expected behavior
Great Performance as usual 😄
When I read a df that I just saved using polars I expect good performance without need to explicitly rechunk myself.
Installed versions
The text was updated successfully, but these errors were encountered: