Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-42548][SQL] Add ReferenceAllColumns to skip rewriting attributes #40154

Closed
wants to merge 1 commit into from

Conversation

ulysses-you
Copy link
Contributor

@ulysses-you ulysses-you commented Feb 24, 2023

What changes were proposed in this pull request?

Add a new trait ReferenceAllColumns that overrides references using children output. Then we can skip it during rewriting attributes in transformUpWithNewOutput.

Why are the changes needed?

There are two reasons with this new trait:

  1. it's dangerous to call references on an unresolved plan that all of references come from children
  2. it's unnecessary to rewrite its attributes that all of references come from children

Does this PR introduce any user-facing change?

prevent potential bug

How was this patch tested?

add test and pass CI

val t2 = LocalRelation(AttributeReference("c", DecimalType(2, 0))())
val unresolved = t1.union(t2).select(UnresolvedStar(None))
val plainReferences = FakePlainReferences(unresolved)
val wp1 = widenSetOperationTypes(plainReferences.select(t1.output.head))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before, it would throw
org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to toAttribute on unresolved object

*
* Note, the only used place is at [[QueryPlan.transformUpWithNewOutput]].
*/
trait PlainReferences[PlanType <: QueryPlan[PlanType]] { self: QueryPlan[PlanType] =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about ReferenceAllColumns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, renamed

@ulysses-you ulysses-you changed the title [SPARK-42548][SQL] Add PlainReferences to skip rewriting attributes [SPARK-42548][SQL] Add ReferenceAllColumns to skip rewriting attributes Feb 24, 2023
@ulysses-you
Copy link
Contributor Author

@cloud-fan can we have this in branch-3.4 ? since it prevent potential bug and it's friendly for developers.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.4!

@cloud-fan cloud-fan closed this in db0e822 Feb 28, 2023
cloud-fan pushed a commit that referenced this pull request Feb 28, 2023
### What changes were proposed in this pull request?

Add a new trait `ReferenceAllColumns ` that overrides `references` using children output. Then we can skip it during rewriting attributes in transformUpWithNewOutput.

### Why are the changes needed?

There are two reasons with this new trait:

1. it's dangerous to call `references` on an unresolved plan that all of references come from children
2. it's unnecessary to rewrite its attributes that all of references come from children

### Does this PR introduce _any_ user-facing change?

prevent potential bug

### How was this patch tested?

add test and pass CI

Closes #40154 from ulysses-you/references.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit db0e822)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@ulysses-you ulysses-you deleted the references branch February 28, 2023 07:54
snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
### What changes were proposed in this pull request?

Add a new trait `ReferenceAllColumns ` that overrides `references` using children output. Then we can skip it during rewriting attributes in transformUpWithNewOutput.

### Why are the changes needed?

There are two reasons with this new trait:

1. it's dangerous to call `references` on an unresolved plan that all of references come from children
2. it's unnecessary to rewrite its attributes that all of references come from children

### Does this PR introduce _any_ user-facing change?

prevent potential bug

### How was this patch tested?

add test and pass CI

Closes apache#40154 from ulysses-you/references.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit db0e822)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants