Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop copying LogicalPlan and Exprs in CommonSubexprEliminate (2-3% planning speed improvement) #10835

Merged
merged 13 commits into from
Jun 19, 2024
Merged
5 changes: 5 additions & 0 deletions datafusion/common/src/tree_node.rs
Original file line number Diff line number Diff line change
Expand Up @@ -615,6 +615,11 @@ impl<T> Transformed<T> {
}
}

/// Create a `Transformed` with `transformed and [`TreeNodeRecursion::Continue`].
pub fn new_transformed(data: T, transformed: bool) -> Self {
Self::new(data, transformed, TreeNodeRecursion::Continue)
}

/// Wrapper for transformed data with [`TreeNodeRecursion::Continue`] statement.
pub fn yes(data: T) -> Self {
Self::new(data, true, TreeNodeRecursion::Continue)
Expand Down
66 changes: 35 additions & 31 deletions datafusion/expr/src/logical_plan/plan.rs
Original file line number Diff line number Diff line change
Expand Up @@ -870,37 +870,7 @@ impl LogicalPlan {
LogicalPlan::Filter { .. } => {
assert_eq!(1, expr.len());
let predicate = expr.pop().unwrap();

// filter predicates should not contain aliased expressions so we remove any aliases
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is extract into Filter::remove_aliases as it is needed for CSE.

// before this logic was added we would have aliases within filters such as for
// benchmark q6:
//
// lineitem.l_shipdate >= Date32(\"8766\")
// AND lineitem.l_shipdate < Date32(\"9131\")
// AND CAST(lineitem.l_discount AS Decimal128(30, 15)) AS lineitem.l_discount >=
// Decimal128(Some(49999999999999),30,15)
// AND CAST(lineitem.l_discount AS Decimal128(30, 15)) AS lineitem.l_discount <=
// Decimal128(Some(69999999999999),30,15)
// AND lineitem.l_quantity < Decimal128(Some(2400),15,2)

let predicate = predicate
.transform_down(|expr| {
match expr {
Expr::Exists { .. }
| Expr::ScalarSubquery(_)
| Expr::InSubquery(_) => {
// subqueries could contain aliases so we don't recurse into those
Ok(Transformed::new(expr, false, TreeNodeRecursion::Jump))
}
Expr::Alias(_) => Ok(Transformed::new(
expr.unalias(),
true,
TreeNodeRecursion::Jump,
)),
_ => Ok(Transformed::no(expr)),
}
})
.data()?;
let predicate = Filter::remove_aliases(predicate)?.data;

Filter::try_new(predicate, Arc::new(inputs.swap_remove(0)))
.map(LogicalPlan::Filter)
Expand Down Expand Up @@ -2230,6 +2200,40 @@ impl Filter {
}
false
}

/// Remove aliases from a predicate for use in a `Filter`
///
/// filter predicates should not contain aliased expressions so we remove
/// any aliases.
///
/// before this logic was added we would have aliases within filters such as
/// for benchmark q6:
///
/// ```sql
/// lineitem.l_shipdate >= Date32(\"8766\")
/// AND lineitem.l_shipdate < Date32(\"9131\")
/// AND CAST(lineitem.l_discount AS Decimal128(30, 15)) AS lineitem.l_discount >=
/// Decimal128(Some(49999999999999),30,15)
/// AND CAST(lineitem.l_discount AS Decimal128(30, 15)) AS lineitem.l_discount <=
/// Decimal128(Some(69999999999999),30,15)
/// AND lineitem.l_quantity < Decimal128(Some(2400),15,2)
/// ```
pub fn remove_aliases(predicate: Expr) -> Result<Transformed<Expr>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it make sense to move this method into Expr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it does -- I will do so as a follow on PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

predicate.transform_down(|expr| {
match expr {
Expr::Exists { .. } | Expr::ScalarSubquery(_) | Expr::InSubquery(_) => {
// subqueries could contain aliases so we don't recurse into those
Ok(Transformed::new(expr, false, TreeNodeRecursion::Jump))
}
Expr::Alias(_) => Ok(Transformed::new(
expr.unalias(),
Copy link
Contributor

@peter-toth peter-toth Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we know that expr is an Expr::Alias maybe we could just use Expr::Alias(alias) => *alias.expr instead of calling .unalias(), that matches the expr again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Done in 550e445

true,
TreeNodeRecursion::Jump,
)),
_ => Ok(Transformed::no(expr)),
}
})
}
}

/// Window its input based on a set of window spec and window function (e.g. SUM or RANK)
Expand Down
Loading
Loading