Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NestedLoopJoin Projection Pushdown #14120

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

jayzhan-synnada
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

Similar to HashJoin, we also need projection pushdown for nested loop join

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Signed-off-by: Jay Zhan <jay.zhan@synnada.ai>
Signed-off-by: Jay Zhan <jay.zhan@synnada.ai>
@github-actions github-actions bot added physical-expr Physical Expressions core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) proto Related to proto crate labels Jan 14, 2025
Signed-off-by: Jay Zhan <jay.zhan@synnada.ai>
Signed-off-by: Jay Zhan <jay.zhan@synnada.ai>
Signed-off-by: Jay Zhan <jay.zhan@synnada.ai>
Signed-off-by: Jay Zhan <jay.zhan@synnada.ai>
Signed-off-by: Jay Zhan <jay.zhan@synnada.ai>
@jayzhan-synnada jayzhan-synnada marked this pull request as draft January 14, 2025 08:12
Signed-off-by: Jay Zhan <jay.zhan@synnada.ai>
Signed-off-by: Jay Zhan <jay.zhan@synnada.ai>
@jayzhan-synnada jayzhan-synnada marked this pull request as ready for review January 14, 2025 10:06
@github-actions github-actions bot added the optimizer Optimizer rules label Jan 15, 2025
Copy link
Contributor

@berkaysynnada berkaysynnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jayzhan-synnada. This PR clearly brings an improvement, but I have a suggestion to further optimize projections. It will be even better if we handle mentioned cases as well.

@@ -144,7 +144,10 @@ pub fn remove_unnecessary_projections(
} else if let Some(cross_join) = input.downcast_ref::<CrossJoinExec>() {
try_swapping_with_cross_join(projection, cross_join)?
} else if let Some(nl_join) = input.downcast_ref::<NestedLoopJoinExec>() {
try_swapping_with_nested_loop_join(projection, nl_join)?
try_pushdown_through_nested_loop_join(projection, nl_join)?.map_or_else(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the same pattern exists in HashJoin part too, but does it take much effort if we somehow unify this "first try pushdown, if not possible, then embed" approach? Like "embed and pushdown", whichever possible.

@@ -626,6 +635,63 @@ fn collect_column_indices(exprs: &[(Arc<dyn PhysicalExpr>, String)]) -> Vec<usiz
indices
}

fn try_pushdown_through_nested_loop_join(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is, can we do the possible pushdown and embed operation at the same time in this function

let expected = [
"NestedLoopJoinExec: join_type=Inner, filter=a@0 < b@1, projection=[c@2]",
" CsvExec: file_groups={1 group: [[x]]}, projection=[a, b, c, d, e], has_header=false",
" CsvExec: file_groups={1 group: [[x]]}, projection=[a, b, c, d, e], has_header=false",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A solid example of what I am looking for is, these plans will project a&c (first CsvExec), and b only (second CsvExec). Embedding the projection into CsvExec is already done, we just need to pushdown the projection below NestedLoopJoinExec

@berkaysynnada
Copy link
Contributor

I'll merge this once the conflicts are resolved, and @jayzhan-synnada will create a ticket for the item that drives projection optimizations further

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate optimizer Optimizer rules physical-expr Physical Expressions proto Related to proto crate sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants