-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32136][SQL] NormalizeFloatingNumbers should work on null struct #28962
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,7 +17,7 @@ | |
|
||
package org.apache.spark.sql.catalyst.optimizer | ||
|
||
import org.apache.spark.sql.catalyst.expressions.{Alias, And, ArrayTransform, CreateArray, CreateMap, CreateNamedStruct, CreateStruct, EqualTo, ExpectsInputTypes, Expression, GetStructField, KnownFloatingPointNormalized, LambdaFunction, NamedLambdaVariable, UnaryExpression} | ||
import org.apache.spark.sql.catalyst.expressions.{Alias, And, ArrayTransform, CreateArray, CreateMap, CreateNamedStruct, CreateStruct, EqualTo, ExpectsInputTypes, Expression, GetStructField, If, IsNull, KnownFloatingPointNormalized, LambdaFunction, Literal, NamedLambdaVariable, UnaryExpression} | ||
import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode} | ||
import org.apache.spark.sql.catalyst.planning.ExtractEquiJoinKeys | ||
import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Subquery, Window} | ||
|
@@ -123,7 +123,8 @@ object NormalizeFloatingNumbers extends Rule[LogicalPlan] { | |
val fields = expr.dataType.asInstanceOf[StructType].fields.indices.map { i => | ||
normalize(GetStructField(expr, i)) | ||
} | ||
CreateStruct(fields) | ||
val struct = CreateStruct(fields) | ||
If(IsNull(expr), Literal(null, struct.dataType), struct) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm just wondering if we need a new Literal here. Maybe, can we simple just put - val struct = CreateStruct(fields)
- If(IsNull(expr), Literal(null, struct.dataType), struct)
+ If(IsNull(expr), expr, CreateStruct(fields)) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The AS-IS also looks good to me, too. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. literal is better here to simplify the expression tree, and probably has better perf. |
||
|
||
case _ if expr.dataType.isInstanceOf[ArrayType] => | ||
val ArrayType(et, containsNull) = expr.dataType | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know why this works before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CreateStruct
re-creates the struct whenNormalizeFloatingNumbers
is run. The originalexpr
is not used. But now we wrapexpr
inIsNull
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, got it!
This reminds me that we can optimize the code a bit: Instead of only matching
CreateNamedStruct
, we can have a whitelist to apply the normalization to the children. For example,If(..., CreateNamedStruct(...))
, we should apply normalization to the innerCreateNamedStruct
.We can do this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For
If(..., CreateNamedStruct(...))
, it is struct type so already covered by theStructType
case. So the normalization is already applied to the innerCreateNamedStruct
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea I know, but I mean we can do better. We can directly apply the normalization to the
children
inIf(..., CreateNamedStruct(children))
, instead of treating the outerIf
as a black box and apply normalization to it.