Deserializing value should not hold the TaskSchedulerImpl lock #58

markhamstra · 2015-05-22T01:56:50Z

SKIPME

Backport of some of the SPARK-7655 fix.

…lerImpl lock We should not call `DirectTaskResult.value` when holding the `TaskSchedulerImpl` lock. It may cost dozens of seconds to deserialize a large object. Author: zsxwing <zsxwing@gmail.com> Closes apache#6195 from zsxwing/SPARK-7655 and squashes the following commits: 21f502e [zsxwing] Add more comments e25fa88 [zsxwing] Add comments 15010b5 [zsxwing] Deserialize value should not hold the TaskSchedulerImpl lock

mbautin · 2015-05-22T02:09:20Z

LGTM

Deserializing value should not hold the TaskSchedulerImpl lock

…conf ## What changes were proposed in this pull request? This is an effort to reduce the difference between Hive and Spark. Spark supports case-sensitivity in columns. Especially, for Struct types, with `spark.sql.caseSensitive=true`, the following is supported. ```scala scala> sql("select named_struct('a', 1, 'A', 2).a").show +--------------------------+ |named_struct(a, 1, A, 2).a| +--------------------------+ | 1| +--------------------------+ scala> sql("select named_struct('a', 1, 'A', 2).A").show +--------------------------+ |named_struct(a, 1, A, 2).A| +--------------------------+ | 2| +--------------------------+ ``` And vice versa, with `spark.sql.caseSensitive=false`, the following is supported. ```scala scala> sql("select named_struct('a', 1).A, named_struct('A', 1).a").show +--------------------+--------------------+ |named_struct(a, 1).A|named_struct(A, 1).a| +--------------------+--------------------+ | 1| 1| +--------------------+--------------------+ ``` However, types are considered different. For example, SET operations fail. ```scala scala> sql("SELECT named_struct('a',1) union all (select named_struct('A',2))").show org.apache.spark.sql.AnalysisException: Union can only be performed on tables with the compatible column types. struct<A:int> <> struct<a:int> at the first column of the second table;; 'Union :- Project [named_struct(a, 1) AS named_struct(a, 1)alteryx#57] : +- OneRowRelation$ +- Project [named_struct(A, 2) AS named_struct(A, 2)alteryx#58] +- OneRowRelation$ ``` This PR aims to support case-insensitive type equality. For example, in Set operation, the above operation succeed when `spark.sql.caseSensitive=false`. ```scala scala> sql("SELECT named_struct('a',1) union all (select named_struct('A',2))").show +------------------+ |named_struct(a, 1)| +------------------+ | [1]| | [2]| +------------------+ ``` ## How was this patch tested? Pass the Jenkins with a newly add test case. Author: Dongjoon Hyun <dongjoon@apache.org> Closes apache#18460 from dongjoon-hyun/SPARK-21247.

markhamstra assigned mbautin May 22, 2015

markhamstra added a commit that referenced this pull request May 22, 2015

Merge pull request #58 from markhamstra/backports

aa0bfb7

Deserializing value should not hold the TaskSchedulerImpl lock

markhamstra merged commit aa0bfb7 into alteryx:csd-1.2 May 22, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deserializing value should not hold the TaskSchedulerImpl lock #58

Deserializing value should not hold the TaskSchedulerImpl lock #58

markhamstra commented May 22, 2015

mbautin commented May 22, 2015

Deserializing value should not hold the TaskSchedulerImpl lock #58

Deserializing value should not hold the TaskSchedulerImpl lock #58

Conversation

markhamstra commented May 22, 2015

mbautin commented May 22, 2015