Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: filter null join key optimization rule #3583

Merged
merged 7 commits into from
Dec 17, 2024
Merged

Conversation

kevinzwang
Copy link
Member

No description provided.

@github-actions github-actions bot added the perf label Dec 16, 2024
@kevinzwang kevinzwang requested review from universalmind303 and removed request for desmondcheongzx December 16, 2024 21:07
Comment on lines 16 to 17
/// Inserts a filter before each side of the join to remove rows where a join key is null when it is valid to do so.
/// This will reduce the cardinality of the tables before a join, which may improve join performance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add an example of a query that would benefit from this optimization

JoinType::Semi => (true, true),
};

let left_null_pred = if can_filter_left {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how are we actually determining if the join key is nullable? AFAIK, we don't have a concept of nullable in our fields/dtypes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like this should only push it down if the expr or colum is null, but we don't have a way to determine that. Maybe I'm missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule creates a filter that removes null rows. The join keys themselves do not have to be a null literal or type. So if the join keys are not nullable or do not have null values, this would essentially be a no-op, but if they had say a row where the value was null, it would be removed prior to the join.

Copy link

codspeed-hq bot commented Dec 16, 2024

CodSpeed Performance Report

Merging #3583 will degrade performances by 20.4%

Comparing kevin/null-join-keys (3a0b1a9) with main (5165e5e)

Summary

⚡ 1 improvements
❌ 4 regressions
✅ 22 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main kevin/null-join-keys Change
test_iter_rows_first_row[100 Small Files] 299.6 ms 193.9 ms +54.52%
test_tpch[1-in-memory-native-2] 98.9 ms 110.4 ms -10.48%
test_tpch[1-in-memory-native-7] 133.4 ms 150 ms -11.02%
test_tpch[1-in-memory-native-8] 158.5 ms 182.8 ms -13.28%
test_tpch_sql[1-in-memory-native-8] 155.1 ms 194.8 ms -20.4%

Copy link

codecov bot commented Dec 16, 2024

Codecov Report

Attention: Patch coverage is 98.68421% with 2 lines in your changes missing coverage. Please review.

Project coverage is 77.86%. Comparing base (5165e5e) to head (3a0b1a9).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...lan/src/optimization/rules/filter_null_join_key.rs 98.67% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3583      +/-   ##
==========================================
+ Coverage   77.81%   77.86%   +0.05%     
==========================================
  Files         718      719       +1     
  Lines       88305    88457     +152     
==========================================
+ Hits        68716    68880     +164     
+ Misses      19589    19577      -12     
Files with missing lines Coverage Δ
...rc/daft-logical-plan/src/optimization/optimizer.rs 98.83% <100.00%> (+<0.01%) ⬆️
...lan/src/optimization/rules/filter_null_join_key.rs 98.67% <98.67%> (ø)

... and 9 files with indirect coverage changes

@kevinzwang kevinzwang enabled auto-merge (squash) December 17, 2024 21:36
@kevinzwang kevinzwang merged commit 8620635 into main Dec 17, 2024
40 of 41 checks passed
@kevinzwang kevinzwang deleted the kevin/null-join-keys branch December 17, 2024 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants