Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Projections during Logical Plan #8340

Merged
merged 1 commit into from
Nov 29, 2023
Merged

Optimize Projections during Logical Plan #8340

merged 1 commit into from
Nov 29, 2023

Conversation

mustafasrepo
Copy link
Contributor

Which issue does this PR close?

Closes 8296.

Rationale for this change

What changes are included in this PR?

This PR removes unnecessary Columns(Fields) from the LogicalPlan, when it is beneficial, by adding a projection that removes unnecessary columns from the plan.

This feature is accomplished by optimize_projections LogicalPlan rule.
New OptimizeProjections rule covers the functionality of MergeProjections , PushDownProjection and EliminateProjection. Hence as part of this PR, these rules are removed also.

With this PR

  • we will produce LogicalPlans that contain only absolutely necessary fields by subsequent operators (when doing so is desirable, or beneficial e.g most of the case)
  • Consecutive projections are not always merged. If consecutive projection is used to cache complex intermediate result by subsequent projection, we say that consecutive projection is beneficial.

Consider

Projection: intermediate, intermediate+1
  Projection: t.a/2 as intermediate
    TableScan: t projection=[a]

will not be merged to

Projection: t.a/2, t.a/2+1
  TableScan: t projection=[a]

The reason is that, in the first plan, computation t.a/2 is calculated once, and used by subsequent projection twice.
However, in the second plan computation t.a/2 calculated twice. (This feature solves the problem in the issue.)

Are these changes tested?

Yes

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) substrait labels Nov 28, 2023
----------Projection: aggregate_test_100.c1
------------Filter: aggregate_test_100.c13 != Utf8("C2GT5KVyOPZpgKVl110TyZO0NcJ434")
--------------TableScan: aggregate_test_100 projection=[c1, c13], partial_filters=[aggregate_test_100.c13 != Utf8("C2GT5KVyOPZpgKVl110TyZO0NcJ434")]
------Projection:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plans seem to be all slightly better :)

})
.collect::<HashMap<_, _>>()
}

#[cfg(test)]
mod tests {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move these tests into optimize_projection.rs 🤔, so that we can delete the merge_projection.rs file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to do so in next PR. I didn't want to increase diff because of test movement as this is PR already big.

@@ -198,8 +199,9 @@ fn between_date64_plus_interval() -> Result<()> {
let plan = test_sql(sql)?;
let expected =
"Aggregate: groupBy=[[]], aggr=[[COUNT(Int64(1))]]\
\n Filter: test.col_date64 >= Date64(\"890179200000\") AND test.col_date64 <= Date64(\"897955200000\")\
\n TableScan: test projection=[col_date64]";
\n Projection: \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there is an additional empty projection here compared to before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this (empty) projection is fine, as the aggregate doesn't need any columns.

@mustafasrepo mustafasrepo merged commit 19bdcdc into apache:main Nov 29, 2023
22 checks passed
kavirajk added a commit to kavirajk/datafusion that referenced this pull request Apr 13, 2024
…ze_projections.rs`

Fixes: apache#9978

The PR apache#8340 removed `push_down_projections.rs` in favour of `optimize_projections.rs`. This PR moves the tests as well.

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
kavirajk added a commit to kavirajk/datafusion that referenced this pull request Apr 14, 2024
…ze_projections.rs`

Fixes: apache#9978

The PR apache#8340 removed `push_down_projections.rs` in favour of `optimize_projections.rs`. This PR moves the tests as well.

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
alamb pushed a commit that referenced this pull request Apr 15, 2024
…ze_projections.rs` (#10071)

* cleanup(tests): Move tests from `push_down_projections.rs` to `optimize_projections.rs`

Fixes: #9978

The PR #8340 removed `push_down_projections.rs` in favour of `optimize_projections.rs`. This PR moves the tests as well.

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

* remove the file `push_down_projection.rs`

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

* Fix method signatures that are broken by other PRs

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

* remove `push_down_projections.rs` from `lib.rs`

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

---------

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) substrait
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Conflicting optimization rules: common_sub_expression_eliminate and push_down_projection
3 participants