[SPARK-42548][SQL] Add ReferenceAllColumns to skip rewriting attributes #40154

ulysses-you · 2023-02-24T03:28:15Z

What changes were proposed in this pull request?

Add a new trait ReferenceAllColumns that overrides references using children output. Then we can skip it during rewriting attributes in transformUpWithNewOutput.

Why are the changes needed?

There are two reasons with this new trait:

it's dangerous to call references on an unresolved plan that all of references come from children
it's unnecessary to rewrite its attributes that all of references come from children

Does this PR introduce any user-facing change?

prevent potential bug

How was this patch tested?

add test and pass CI

ulysses-you · 2023-02-24T03:33:28Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala

+    val t2 = LocalRelation(AttributeReference("c", DecimalType(2, 0))())
+    val unresolved = t1.union(t2).select(UnresolvedStar(None))
+    val plainReferences = FakePlainReferences(unresolved)
+    val wp1 = widenSetOperationTypes(plainReferences.select(t1.output.head))


before, it would throw
org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to toAttribute on unresolved object

cloud-fan · 2023-02-24T04:21:17Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/PlainReferences.scala

+ *
+ * Note, the only used place is at [[QueryPlan.transformUpWithNewOutput]].
+ */
+trait PlainReferences[PlanType <: QueryPlan[PlanType]] { self: QueryPlan[PlanType] =>


how about ReferenceAllColumns

yea, renamed

ulysses-you · 2023-02-28T07:46:04Z

@cloud-fan can we have this in branch-3.4 ? since it prevent potential bug and it's friendly for developers.

cloud-fan · 2023-02-28T07:52:51Z

thanks, merging to master/3.4!

### What changes were proposed in this pull request? Add a new trait `ReferenceAllColumns ` that overrides `references` using children output. Then we can skip it during rewriting attributes in transformUpWithNewOutput. ### Why are the changes needed? There are two reasons with this new trait: 1. it's dangerous to call `references` on an unresolved plan that all of references come from children 2. it's unnecessary to rewrite its attributes that all of references come from children ### Does this PR introduce _any_ user-facing change? prevent potential bug ### How was this patch tested? add test and pass CI Closes #40154 from ulysses-you/references. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit db0e822) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

### What changes were proposed in this pull request? Add a new trait `ReferenceAllColumns ` that overrides `references` using children output. Then we can skip it during rewriting attributes in transformUpWithNewOutput. ### Why are the changes needed? There are two reasons with this new trait: 1. it's dangerous to call `references` on an unresolved plan that all of references come from children 2. it's unnecessary to rewrite its attributes that all of references come from children ### Does this PR introduce _any_ user-facing change? prevent potential bug ### How was this patch tested? add test and pass CI Closes apache#40154 from ulysses-you/references. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit db0e822) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

github-actions bot added the SQL label Feb 24, 2023

ulysses-you mentioned this pull request Feb 24, 2023

[SPARK-32638][SQL][FOLLOWUP] Move the plan rewriting methods to QueryPlan #29643

Closed

ulysses-you commented Feb 24, 2023

View reviewed changes

cloud-fan reviewed Feb 24, 2023

View reviewed changes

cloud-fan approved these changes Feb 24, 2023

View reviewed changes

ulysses-you changed the title ~~[SPARK-42548][SQL] Add PlainReferences to skip rewriting attributes~~ [SPARK-42548][SQL] Add ReferenceAllColumns to skip rewriting attributes Feb 24, 2023

ulysses-you force-pushed the references branch from afd9c26 to 0871156 Compare February 27, 2023 01:31

Add ReferenceAllColumns to skip rewriting attributes

80dd679

ulysses-you force-pushed the references branch from 0871156 to 80dd679 Compare February 27, 2023 03:19

cloud-fan closed this in db0e822 Feb 28, 2023

ulysses-you deleted the references branch February 28, 2023 07:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-42548][SQL] Add ReferenceAllColumns to skip rewriting attributes #40154

[SPARK-42548][SQL] Add ReferenceAllColumns to skip rewriting attributes #40154

ulysses-you commented Feb 24, 2023 •

edited

Loading

ulysses-you Feb 24, 2023

cloud-fan Feb 24, 2023

ulysses-you Feb 24, 2023

ulysses-you commented Feb 28, 2023

cloud-fan commented Feb 28, 2023

[SPARK-42548][SQL] Add ReferenceAllColumns to skip rewriting attributes #40154

[SPARK-42548][SQL] Add ReferenceAllColumns to skip rewriting attributes #40154

Conversation

ulysses-you commented Feb 24, 2023 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

ulysses-you Feb 24, 2023

Choose a reason for hiding this comment

cloud-fan Feb 24, 2023

Choose a reason for hiding this comment

ulysses-you Feb 24, 2023

Choose a reason for hiding this comment

ulysses-you commented Feb 28, 2023

cloud-fan commented Feb 28, 2023

ulysses-you commented Feb 24, 2023 •

edited

Loading