-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-19104][SQL] Lambda variables in ExternalMapToCatalyst should be global #18418
Conversation
Test build #78599 has finished for PR 18418 at commit
|
retest this please. |
Test build #78601 has finished for PR 18418 at commit
|
Test build #78609 has finished for PR 18418 at commit
|
retest this please. |
Test build #78613 has finished for PR 18418 at commit
|
retest this please. |
Test build #78625 has finished for PR 18418 at commit
|
retest this please. |
Test build #78635 has finished for PR 18418 at commit
|
@@ -736,7 +736,27 @@ class CodegenContext { | |||
// Cannot split these expressions because they are not created from a row object. | |||
return expressions.mkString("\n") | |||
} | |||
splitExpressions(expressions, "apply", ("InternalRow", row) :: Nil) | |||
splitExpressions(row, expressions, Seq.empty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the check in lines 735-738? Now, we do the same check at lines 754-757, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed, thanks.
cc @cloud-fan |
// of the functions. | ||
val (valExprCodes, valExprParams) = valExprs.map { expr => | ||
val exprCode = expr.genCode(ctx) | ||
val lambdaVars = expr.collect { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks hacky, and why you only do it for CreateNamedStruct
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, CreateArray
is another expression using splitExpressions
. However, if the external type is an array, MapObjects
will be used to construct the internal array. CreateMap
is the same, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC there are a lot of places we use splitExpressions
, shall we check all of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very sure about other places. I move the collecting of lambda variable to splitExpressions
and let the possible places to do this check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach doesn't work well after rethinking about this and more experiments.
A simpler approach is letting those lambda variables global as MapObjectsand
CollectObjectsToMap` do.
Test build #78666 has finished for PR 18418 at commit
|
Test build #78678 has finished for PR 18418 at commit
|
can you put more detailed information about what is the bug? I don't understand why it only happens for map inside case class. |
eb061da
to
65d8873
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I think this is the right fix according to how we deal with this kind of problem before. LGTM
BTW please update PR tile to mention that it's a bug fix for |
@cloud-fan Thanks. The PR title and description are both updated to reflect the change. |
Test build #78690 has finished for PR 18418 at commit
|
Test build #78688 has finished for PR 18418 at commit
|
@@ -354,4 +357,9 @@ class DatasetPrimitiveSuite extends QueryTest with SharedSQLContext { | |||
checkDataset(Seq(PackageClass(1)).toDS(), PackageClass(1)) | |||
} | |||
|
|||
test("SPARK-19104: lambda variables should work when parent expression splits generated codes") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should also update the test name...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
miss it. updated.
Test build #78698 has finished for PR 18418 at commit
|
retest this please. |
Test build #78701 has finished for PR 18418 at commit
|
the test failure is unrelated, merging to master/2.2! |
…e global The issue happens in `ExternalMapToCatalyst`. For example, the following codes create `ExternalMapToCatalyst` to convert Scala Map to catalyst map format. val data = Seq.tabulate(10)(i => NestedData(1, Map("key" -> InnerData("name", i + 100)))) val ds = spark.createDataset(data) The `valueConverter` in `ExternalMapToCatalyst` looks like: if (isnull(lambdavariable(ExternalMapToCatalyst_value52, ExternalMapToCatalyst_value_isNull52, ObjectType(class org.apache.spark.sql.InnerData), true))) null else named_struct(name, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, assertnotnull(lambdavariable(ExternalMapToCatalyst_value52, ExternalMapToCatalyst_value_isNull52, ObjectType(class org.apache.spark.sql.InnerData), true)).name, true), value, assertnotnull(lambdavariable(ExternalMapToCatalyst_value52, ExternalMapToCatalyst_value_isNull52, ObjectType(class org.apache.spark.sql.InnerData), true)).value) There is a `CreateNamedStruct` expression (`named_struct`) to create a row of `InnerData.name` and `InnerData.value` that are referred by `ExternalMapToCatalyst_value52`. Because `ExternalMapToCatalyst_value52` are local variable, when `CreateNamedStruct` splits expressions to individual functions, the local variable can't be accessed anymore. Jenkins tests. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #18418 from viirya/SPARK-19104. (cherry picked from commit fd8c931) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…e global ## What changes were proposed in this pull request? The issue happens in `ExternalMapToCatalyst`. For example, the following codes create `ExternalMapToCatalyst` to convert Scala Map to catalyst map format. val data = Seq.tabulate(10)(i => NestedData(1, Map("key" -> InnerData("name", i + 100)))) val ds = spark.createDataset(data) The `valueConverter` in `ExternalMapToCatalyst` looks like: if (isnull(lambdavariable(ExternalMapToCatalyst_value52, ExternalMapToCatalyst_value_isNull52, ObjectType(class org.apache.spark.sql.InnerData), true))) null else named_struct(name, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, assertnotnull(lambdavariable(ExternalMapToCatalyst_value52, ExternalMapToCatalyst_value_isNull52, ObjectType(class org.apache.spark.sql.InnerData), true)).name, true), value, assertnotnull(lambdavariable(ExternalMapToCatalyst_value52, ExternalMapToCatalyst_value_isNull52, ObjectType(class org.apache.spark.sql.InnerData), true)).value) There is a `CreateNamedStruct` expression (`named_struct`) to create a row of `InnerData.name` and `InnerData.value` that are referred by `ExternalMapToCatalyst_value52`. Because `ExternalMapToCatalyst_value52` are local variable, when `CreateNamedStruct` splits expressions to individual functions, the local variable can't be accessed anymore. ## How was this patch tested? Jenkins tests. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes apache#18418 from viirya/SPARK-19104.
…alyst should be global ## What changes were proposed in this pull request? This PR is backport of #18418 to Spark 2.1. [SPARK-21391](https://issues.apache.org/jira/browse/SPARK-21391) reported this problem in Spark 2.1. The issue happens in `ExternalMapToCatalyst`. For example, the following codes create ExternalMap`ExternalMapToCatalyst`ToCatalyst to convert Scala Map to catalyst map format. ``` val data = Seq.tabulate(10)(i => NestedData(1, Map("key" -> InnerData("name", i + 100)))) val ds = spark.createDataset(data) ``` The `valueConverter` in `ExternalMapToCatalyst` looks like: ``` if (isnull(lambdavariable(ExternalMapToCatalyst_value52, ExternalMapToCatalyst_value_isNull52, ObjectType(class org.apache.spark.sql.InnerData), true))) null else named_struct(name, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, assertnotnull(lambdavariable(ExternalMapToCatalyst_value52, ExternalMapToCatalyst_value_isNull52, ObjectType(class org.apache.spark.sql.InnerData), true)).name, true), value, assertnotnull(lambdavariable(ExternalMapToCatalyst_value52, ExternalMapToCatalyst_value_isNull52, ObjectType(class org.apache.spark.sql.InnerData), true)).value) ``` There is a `CreateNamedStruct` expression (`named_struct`) to create a row of `InnerData.name` and `InnerData.value` that are referred by `ExternalMapToCatalyst_value52`. Because `ExternalMapToCatalyst_value52` are local variable, when `CreateNamedStruct` splits expressions to individual functions, the local variable can't be accessed anymore. ## How was this patch tested? Added a new test suite into `DatasetPrimitiveSuite` Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #18627 from kiszk/SPARK-21391.
What changes were proposed in this pull request?
The issue happens in
ExternalMapToCatalyst
. For example, the following codes createExternalMapToCatalyst
to convert Scala Map to catalyst map format.The
valueConverter
inExternalMapToCatalyst
looks like:There is a
CreateNamedStruct
expression (named_struct
) to create a row ofInnerData.name
andInnerData.value
that are referred byExternalMapToCatalyst_value52
.Because
ExternalMapToCatalyst_value52
are local variable, whenCreateNamedStruct
splits expressions to individual functions, the local variable can't be accessed anymore.How was this patch tested?
Jenkins tests.