-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-42548][SQL] Add ReferenceAllColumns to skip rewriting attributes #40154
Conversation
val t2 = LocalRelation(AttributeReference("c", DecimalType(2, 0))()) | ||
val unresolved = t1.union(t2).select(UnresolvedStar(None)) | ||
val plainReferences = FakePlainReferences(unresolved) | ||
val wp1 = widenSetOperationTypes(plainReferences.select(t1.output.head)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before, it would throw
org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to toAttribute on unresolved object
* | ||
* Note, the only used place is at [[QueryPlan.transformUpWithNewOutput]]. | ||
*/ | ||
trait PlainReferences[PlanType <: QueryPlan[PlanType]] { self: QueryPlan[PlanType] => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about ReferenceAllColumns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, renamed
afd9c26
to
0871156
Compare
0871156
to
80dd679
Compare
@cloud-fan can we have this in branch-3.4 ? since it prevent potential bug and it's friendly for developers. |
thanks, merging to master/3.4! |
### What changes were proposed in this pull request? Add a new trait `ReferenceAllColumns ` that overrides `references` using children output. Then we can skip it during rewriting attributes in transformUpWithNewOutput. ### Why are the changes needed? There are two reasons with this new trait: 1. it's dangerous to call `references` on an unresolved plan that all of references come from children 2. it's unnecessary to rewrite its attributes that all of references come from children ### Does this PR introduce _any_ user-facing change? prevent potential bug ### How was this patch tested? add test and pass CI Closes #40154 from ulysses-you/references. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit db0e822) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? Add a new trait `ReferenceAllColumns ` that overrides `references` using children output. Then we can skip it during rewriting attributes in transformUpWithNewOutput. ### Why are the changes needed? There are two reasons with this new trait: 1. it's dangerous to call `references` on an unresolved plan that all of references come from children 2. it's unnecessary to rewrite its attributes that all of references come from children ### Does this PR introduce _any_ user-facing change? prevent potential bug ### How was this patch tested? add test and pass CI Closes apache#40154 from ulysses-you/references. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit db0e822) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
Add a new trait
ReferenceAllColumns
that overridesreferences
using children output. Then we can skip it during rewriting attributes in transformUpWithNewOutput.Why are the changes needed?
There are two reasons with this new trait:
references
on an unresolved plan that all of references come from childrenDoes this PR introduce any user-facing change?
prevent potential bug
How was this patch tested?
add test and pass CI