Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify ProjectionPushdown and make it more general #8109

Merged
merged 5 commits into from
Nov 10, 2023

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Nov 9, 2023

Which issue does this PR close?

Follow on to #8073

Rationale for this change

I noticed some ways the code could be simplified and made more general while reviewing the PR,

What changes are included in this PR?

  1. Use TreeNode::transform_down instead of a manual tree walk to rewrite expressions
  2. pull a common check out of each branch

Are these changes tested?

Yes, by existing tests

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Nov 9, 2023

// If the projection does not narrow the the schema, we should not try
// to push it down
if projection.expr().len() >= projection.input().schema().fields().len() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled this out of the individual try_swapping_* calls

{
return Ok(None);
}
let new_expr = expr
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote this logic to use TreeNode recursion / rewriting rather than a manual recursion, which I think is both less code and is more general (handles all PhysicalExprs, not just the ones explicitly checked for here)

@alamb alamb marked this pull request as ready for review November 9, 2023 14:27
@alamb alamb marked this pull request as draft November 9, 2023 14:27
@alamb
Copy link
Contributor Author

alamb commented Nov 9, 2023

Some test is failing -- I will debug it later

/// Convenience utils for writing optimizers rule: recursively apply the given 'op' first to all of its
/// children and then itself(Postorder Traversal) using a mutable function, `F`.
/// When the `op` does not apply to a given node, it is left unchanged.
fn transform_up_mut<F>(self, op: &mut F) -> Result<Self>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mut variant is needed to support changing a variable in the closure (updating state)

@alamb alamb marked this pull request as ready for review November 9, 2023 17:11
@alamb
Copy link
Contributor Author

alamb commented Nov 9, 2023

cc @berkaysynnada

Copy link
Contributor

@ozankabak ozankabak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from one idiom suggestion. I'll let @berkaysynnada take a look and make sure all is right, then this will be good to go from our perspective. Thanks Andrew

@berkaysynnada
Copy link
Contributor

Thanks @alamb, that function will possibly be used for future iterations as well. It's better now.

Co-authored-by: Berkay Şahin <124376117+berkaysynnada@users.noreply.github.com>
Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>
Copy link
Contributor

@ozankabak ozankabak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you

@alamb alamb merged commit e305bcf into apache:main Nov 10, 2023
22 checks passed
@alamb alamb deleted the alamb/more_general branch November 10, 2023 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants