Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Add rewrite_expr convenience method for rewriting Exprs #5092

Merged
merged 6 commits into from
Jan 30, 2023

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jan 27, 2023

Note: this may look like a large change, but it is mostly comments.

Which issue does this PR close?

related to #4854

Rationale for this change

There are some rewrites (like constant propagation) that require the full generality of ExprRewriter that saves state on a previst on down and then rewrite while walking back up. However, most rewrites simply call mutate.

Now you have to define a Visitor struct to pass arguments down the tree. Letting rust do the work for you via a closure makes for less code

What changes are included in this PR?

Changes:

  1. Add a rewrite_expr convenience function (like inspect_expr_pre) for the common case-fold-search
  2. Update docs
  3. Refactor a few uses of ExprRewriter to demonstrate how to use it

If this PR is approved, I'll spend some time porting over other uses of ExprRewriters

Are these changes tested?

Yes, both a new doc tests and existing coverage

Are there any user-facing changes?

Developers can hopefully write lest code

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Jan 27, 2023
/// comments on `Expr::rewrite` for details on its use
/// invoked recursively on all nodes of an expression tree.
///
/// Performs a depth first walk of an expression and its children
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved so it is more visible in rustdocs

@@ -429,27 +433,14 @@ pub fn normalize_col_with_schemas(
schemas: &[&Arc<DFSchema>],
using_columns: &[HashSet<Column>],
) -> Result<Expr> {
struct ColumnNormalizer<'a> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a pretty good example the boilerplate that can be avoided by using rewrite_expr rather than ExprRewriter (rust generates a closure that captures schemas and using_columns) rather than having to do it explicitly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really like this example.👍

@alamb alamb marked this pull request as ready for review January 27, 2023 16:08
@alamb alamb requested review from HaoYang670 and jackwener January 27, 2023 16:08
@alamb
Copy link
Contributor Author

alamb commented Jan 27, 2023

I wonder if @jackwener / @HaoYang670 have any thoughts on this approach?

Copy link
Member

@jackwener jackwener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really very like this PR. After merge it, we can more convenient to use rewrite_expr.

Thanks @alamb .

cc @mingmwang , how do you think it?

///
/// assert_eq!(rewritten, col("a") + lit(42));
/// ```
pub fn rewrite_expr<F>(expr: Expr, f: F) -> Result<Expr>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The real change in this PR is to add this function.

@HaoYang670
Copy link
Contributor

Really sorry for the late. I just came back from the Spring Festival. I will review this tomorrow, thanks.

Copy link
Contributor

@HaoYang670 HaoYang670 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @alamb. I learned much from your PR.

where
F: FnMut(Expr) -> Result<Expr>,
{
fn mutate(&mut self, expr: Expr) -> Result<Expr> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we let ExprRewriter::mutate return Result<Option<E>> in the future, just like what we've done for LogicalPlan::try_optimize?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an interesting idea. What usecase do you have in mind?

Since mutate takes the expr by ownership (rather than by reference), I think returning expr would be the same as not changing the original expr (perhaps the same as returning None?)

However, I do wonder what actually happens when rewriting Boxed exprs 🤔 maybe they are still copied

@@ -519,6 +495,49 @@ pub fn unnormalize_cols(exprs: impl IntoIterator<Item = Expr>) -> Vec<Expr> {
exprs.into_iter().map(unnormalize_col).collect()
}

struct RewriterAdapter<F> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to add some comments for RewriterAdapter, but this is not a blocker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, added in 34a725c

@alamb alamb merged commit 51d5840 into apache:master Jan 30, 2023
@alamb alamb deleted the alamb/rewrite_adapter branch January 30, 2023 21:24
@alamb
Copy link
Contributor Author

alamb commented Jan 30, 2023

Thank you for the reviews @HaoYang670 and @jackwener

@ursabot
Copy link

ursabot commented Jan 30, 2023

Benchmark runs are scheduled for baseline = 67b1da8 and contender = 51d5840. 51d5840 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants