Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fix performance regression for eager join_where #21308

Merged
merged 6 commits into from
Feb 18, 2025

Conversation

nameexhaustion
Copy link
Collaborator

@nameexhaustion nameexhaustion commented Feb 18, 2025

@github-actions github-actions bot added fix Bug fix python Related to Python Polars rust Related to Rust Polars labels Feb 18, 2025
@@ -181,7 +193,7 @@ More information on the new streaming engine: https://github.com/pola-rs/polars/
}

// Make sure it is after predicate pushdown
if opt_flags.collapse_joins() && members.has_filter_with_join_input {
Copy link
Collaborator Author

@nameexhaustion nameexhaustion Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During IR conversion a join_where is converted to a cross-join with subsequent filters, and we rely on collapse_joins to convert to more performant joins. However the issue is that during the optimization step, members: MemberCollector was never initialized if opt_flags contained EAGER, causing collapse_joins to be skipped despite it being enabled. This caused the performance regression as we end up materializing an entire cross-join.

The fix for the linked issue is just this 1 line - the other changes are to make it so that members is initialized as-needed rather than based on combinations of flags to help avoid making similar mistakes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch.. that's a painful one.

Copy link

codecov bot commented Feb 18, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.91%. Comparing base (a438cc3) to head (3cec05d).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #21308      +/-   ##
==========================================
+ Coverage   79.90%   79.91%   +0.01%     
==========================================
  Files        1596     1596              
  Lines      228580   228593      +13     
  Branches     2608     2608              
==========================================
+ Hits       182644   182690      +46     
+ Misses      45340    45307      -33     
  Partials      596      596              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ritchie46 ritchie46 merged commit e0a3bb5 into pola-rs:main Feb 18, 2025
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

memory-usage/performance regression on join_where in 1.19
2 participants