-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: filter null join key optimization rule #3583
Conversation
/// Inserts a filter before each side of the join to remove rows where a join key is null when it is valid to do so. | ||
/// This will reduce the cardinality of the tables before a join, which may improve join performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add an example of a query that would benefit from this optimization
JoinType::Semi => (true, true), | ||
}; | ||
|
||
let left_null_pred = if can_filter_left { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how are we actually determining if the join key is nullable? AFAIK, we don't have a concept of nullable in our fields/dtypes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like this should only push it down if the expr or colum is null
, but we don't have a way to determine that. Maybe I'm missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This rule creates a filter that removes null rows. The join keys themselves do not have to be a null literal or type. So if the join keys are not nullable or do not have null values, this would essentially be a no-op, but if they had say a row where the value was null, it would be removed prior to the join.
CodSpeed Performance ReportMerging #3583 will degrade performances by 20.4%Comparing Summary
Benchmarks breakdown
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3583 +/- ##
==========================================
+ Coverage 77.81% 77.86% +0.05%
==========================================
Files 718 719 +1
Lines 88305 88457 +152
==========================================
+ Hits 68716 68880 +164
+ Misses 19589 19577 -12
|
No description provided.