-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32136][SQL] NormalizeFloatingNumbers should work on null struct #28962
Conversation
Thank you so much, @viirya . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch and thanks for the quick fix!
@@ -123,7 +123,8 @@ object NormalizeFloatingNumbers extends Rule[LogicalPlan] { | |||
val fields = expr.dataType.asInstanceOf[StructType].fields.indices.map { i => | |||
normalize(GetStructField(expr, i)) | |||
} | |||
CreateStruct(fields) | |||
val struct = CreateStruct(fields) | |||
If(IsNull(expr), Literal(null, struct.dataType), struct) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just wondering if we need a new Literal here. Maybe, can we simple just put expr
?
- val struct = CreateStruct(fields)
- If(IsNull(expr), Literal(null, struct.dataType), struct)
+ If(IsNull(expr), expr, CreateStruct(fields))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AS-IS also looks good to me, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
literal is better here to simplify the expression tree, and probably has better perf.
Test build #124746 has finished for PR 28962 at commit
|
retest this please |
The test failure is legitimate, can you take a look? Seems we need to skip the |
Test build #124799 has finished for PR 28962 at commit
|
retest this please |
@@ -123,7 +123,8 @@ object NormalizeFloatingNumbers extends Rule[LogicalPlan] { | |||
val fields = expr.dataType.asInstanceOf[StructType].fields.indices.map { i => | |||
normalize(GetStructField(expr, i)) | |||
} | |||
CreateStruct(fields) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know why this works before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CreateStruct
re-creates the struct when NormalizeFloatingNumbers
is run. The original expr
is not used. But now we wrap expr
in IsNull
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, got it!
This reminds me that we can optimize the code a bit: Instead of only matching CreateNamedStruct
, we can have a whitelist to apply the normalization to the children. For example, If(..., CreateNamedStruct(...))
, we should apply normalization to the inner CreateNamedStruct
.
We can do this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For If(..., CreateNamedStruct(...))
, it is struct type so already covered by the StructType
case. So the normalization is already applied to the inner CreateNamedStruct
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea I know, but I mean we can do better. We can directly apply the normalization to the children
in If(..., CreateNamedStruct(children))
, instead of treating the outer If
as a black box and apply normalization to it.
Test build #124771 has finished for PR 28962 at commit
|
It seems that we hit another relevant failure.
|
No, it was triggered before the last commit. |
Oh, my bad. Got it~ |
Test build #124803 has finished for PR 28962 at commit
|
PySpark failure looks irrelevant.
|
retest this please... |
Test build #124829 has finished for PR 28962 at commit
|
Merged to master and branch-3.0. |
### What changes were proposed in this pull request? This patch fixes wrong groupBy result if the grouping key is a null-value struct. ### Why are the changes needed? `NormalizeFloatingNumbers` reconstructs a struct if input expression is StructType. If the input struct is null, it will reconstruct a struct with null-value fields, instead of null. ### Does this PR introduce _any_ user-facing change? Yes, fixing incorrect groupBy result. ### How was this patch tested? Unit test. Closes #28962 from viirya/SPARK-32136. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org> (cherry picked from commit 3f7780d) Signed-off-by: HyukjinKwon <gurwls223@apache.org>
Thanks all! |
…seWhen/Coalesce child expressions ### What changes were proposed in this pull request? This patch proposes to let `NormalizeFloatingNumbers` rule directly normalizes on certain children expressions. It could simplify expression tree. ### Why are the changes needed? Currently NormalizeFloatingNumbers rule treats some expressions as black box but we can optimize it a bit by normalizing directly the inner children expressions. Also see #28962 (comment). ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit tests. Closes #29061 from viirya/SPARK-32258. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
This patch fixes wrong groupBy result if the grouping key is a null-value struct.
Why are the changes needed?
NormalizeFloatingNumbers
reconstructs a struct if input expression is StructType. If the input struct is null, it will reconstruct a struct with null-value fields, instead of null.Does this PR introduce any user-facing change?
Yes, fixing incorrect groupBy result.
How was this patch tested?
Unit test.