memory-usage/performance regression on join_where
in 1.19
#21145
Labels
accepted
Ready for implementation
bug
Something isn't working
P-medium
Priority: medium
performance
Performance issues or improvements
python
Related to Python Polars
Checks
Reproducible example
Log output
Issue description
The performance and memory usage of
join_where
have respectively decreased and increased dramatically in version1.19
and remained stable after that,1.22
being the latest version I tried. The above reproducer is ~30x slower and ~150x more memory hungry in1.19
than it is in1.18
:The changelog for
1.19
mentions this change: https://github.com/pola-rs/polars/pull/20525/files.Assuming this is the change that introduced the regression, I appreciate the convenience of being able to use arbitrary expressions. Is there any chance this is compatible with the old performance?
Expected behavior
Performance of inequality joins is 1 or 2 orders of magnitude better than that of a naive cartesian product + filter.
Installed versions
The text was updated successfully, but these errors were encountered: