[SPARK-33389][SQL] Make internal classes of SparkSession always using active SQLConf #30299

luluorta · 2020-11-09T13:54:00Z

What changes were proposed in this pull request?

This PR makes internal classes of SparkSession always using active SQLConf. We should remove all conf: SQLConfs from ctor-parameters of this classes (Analyzer, SparkPlanner, SessionCatalog, CatalogManager SparkSqlParser and etc.) and use SQLConf.get instead.

Why are the changes needed?

Code refine.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing test

SparkQA · 2020-11-09T14:06:09Z

Test build #130790 has finished for PR 30299 at commit ad9d6d2.

This patch fails to build.
This patch merges cleanly.
This patch adds the following public classes (experimental):
trait HasConf
class Analyzer(override val catalogManager: CatalogManager)
class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging with HasConf
abstract class AbstractSqlParser extends ParserInterface with Logging with HasConf
class CatalystSqlParser extends AbstractSqlParser
abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanType] with HasConf
abstract class Rule[TreeType <: TreeNode[_]] extends HasConf with Logging
class SparkPlanner(val session: SparkSession, val experimentalMethods: ExperimentalMethods)
class SparkSqlParser extends AbstractSqlParser
class SparkSqlAstBuilder extends AstBuilder
class V2SessionCatalog(catalog: SessionCatalog)
class VariableSubstitution extends HasConf

SparkQA · 2020-11-10T03:14:47Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35430/

SparkQA · 2020-11-10T03:43:41Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35430/

SparkQA · 2020-11-10T04:03:06Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35432/

SparkQA · 2020-11-10T04:25:50Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35432/

SparkQA · 2020-11-10T08:05:01Z

Test build #130821 has finished for PR 30299 at commit 90181f9.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds the following public classes (experimental):
trait HasConf
class Analyzer(override val catalogManager: CatalogManager)
class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging with HasConf
abstract class AbstractSqlParser extends ParserInterface with Logging with HasConf
class CatalystSqlParser extends AbstractSqlParser
abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanType] with HasConf
abstract class Rule[TreeType <: TreeNode[_]] extends HasConf with Logging
class SparkPlanner(val session: SparkSession, val experimentalMethods: ExperimentalMethods)
class SparkSqlParser extends AbstractSqlParser
class SparkSqlAstBuilder extends AstBuilder
class V2SessionCatalog(catalog: SessionCatalog)
class VariableSubstitution extends HasConf

SparkQA · 2020-11-10T08:05:02Z

Test build #130824 has finished for PR 30299 at commit f33d4be.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds the following public classes (experimental):
trait HasConf
class Analyzer(override val catalogManager: CatalogManager)
class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging with HasConf
abstract class AbstractSqlParser extends ParserInterface with Logging with HasConf
class CatalystSqlParser extends AbstractSqlParser
abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanType] with HasConf
abstract class Rule[TreeType <: TreeNode[_]] extends HasConf with Logging
class SparkPlanner(val session: SparkSession, val experimentalMethods: ExperimentalMethods)
class SparkSqlParser extends AbstractSqlParser
class SparkSqlAstBuilder extends AstBuilder
class V2SessionCatalog(catalog: SessionCatalog)
class VariableSubstitution extends HasConf

SparkQA · 2020-11-10T11:01:52Z

Test build #130857 has finished for PR 30299 at commit b5366df.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-11-10T11:20:48Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35464/

SparkQA · 2020-11-10T11:42:21Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35464/

cloud-fan · 2020-11-10T13:34:22Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/HasConf.scala

+/**
+ * Trait for shared SQLConf.
+ */
+trait HasConf {


This name is weird. How about SQLConfHelper?

I agree, SQLConfHelper is better.

cloud-fan · 2020-11-10T13:34:46Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/HasConf.scala

+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Trait for shared SQLConf.


Trait for getting the active SQLConf.

cloud-fan · 2020-11-10T13:35:29Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

@@ -42,7 +42,7 @@ import org.apache.spark.sql.catalyst.trees.TreeNodeRef
 import org.apache.spark.sql.catalyst.util.toPrettySQL
 import org.apache.spark.sql.connector.catalog._
 import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._
-import org.apache.spark.sql.connector.catalog.TableChange.{AddColumn, After, ColumnChange, ColumnPosition, DeleteColumn, RenameColumn, UpdateColumnComment, UpdateColumnNullability, UpdateColumnPosition, UpdateColumnType}
+import org.apache.spark.sql.connector.catalog.TableChange.{First => _, _}


unnecessary change?

cloud-fan · 2020-11-10T13:37:16Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala

  import SessionCatalog._
  import CatalogTypes.TablePartitionSpec

  // For testing only.
  def this(
      externalCatalog: ExternalCatalog,
      functionRegistry: FunctionRegistry,
-      conf: SQLConf) = {
+      staticConf: SQLConf) = {


The previous name conf is OK. SQLConf contains both static and runtime configs.

cloud-fan · 2020-11-10T13:42:29Z

...lyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala

-    val expected = caseInsensitiveAnalyzer.execute(
-      testRelation.where('a > 2 && ('b > 3 || 'b < 5)))
-    comparePlans(actual, expected)
+    withSQLConf(SQLConf.CASE_SENSITIVE.key -> "false") {


This is the default value. Seems we can remove withSQLConf?

cloud-fan · 2020-11-10T13:42:58Z

...lyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala

-    val expected = caseInsensitiveAnalyzer.execute(
-      testRelation.where('a > 2 || ('b > 3 && 'b < 5)))
-    comparePlans(actual, expected)
+    withSQLConf(SQLConf.CASE_SENSITIVE.key -> "false") {


cloud-fan · 2020-11-10T13:52:26Z

sql/core/src/test/resources/sql-tests/inputs/datetime.sql

@@ -149,7 +149,7 @@ select to_timestamp('2019-10-06 A', 'yyyy-MM-dd GGGGG');
 select to_timestamp('22 05 2020 Friday', 'dd MM yyyy EEEEEE');
 select to_timestamp('22 05 2020 Friday', 'dd MM yyyy EEEEE');
 select unix_timestamp('22 05 2020 Friday', 'dd MM yyyy EEEEE');
-select from_json('{"time":"26/October/2015"}', 'time Timestamp', map('timestampFormat', 'dd/MMMMM/yyyy'));
+select from_json('{"timestamp":"26/October/2015"}', 'timestamp Timestamp', map('timestampFormat', 'dd/MMMMM/yyyy'));


why do we change this test?

After this PR, dynamically set "spark.sql.ansi.enabled" actually takes effect in parsing phase. This query will fails parsing cause time is a reserved key word of SQL standard.

timestamp is also reserved in ANSI standard. How about ts?

cloud-fan · 2020-11-10T13:54:10Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala

@@ -60,8 +59,6 @@ case class HiveTableScanExec(
  require(partitionPruningPred.isEmpty || relation.isPartitioned,
    "Partition pruning predicates only supported for partitioned tables.")

-  override def conf: SQLConf = sparkSession.sessionState.conf


This actually makes sense, to make sure the conf matches the spark session. How about we move this override to SparkPlan?

SparkQA · 2020-11-11T08:00:47Z

Test build #130913 has finished for PR 30299 at commit 7047924.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-11-11T08:44:28Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35519/

SparkQA · 2020-11-11T09:13:08Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35519/

SparkQA · 2020-11-11T12:15:54Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35534/

SparkQA · 2020-11-11T12:37:13Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35534/

SparkQA · 2020-11-11T14:41:09Z

Test build #130929 has finished for PR 30299 at commit ce97a61.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-11-11T15:12:28Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35545/

SparkQA · 2020-11-11T15:40:40Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35545/

gatorsmile · 2020-11-11T17:05:37Z

sql/core/src/test/resources/sql-tests/inputs/datetime.sql

-select from_json('{"date":"26/October/2015"}', 'date Date', map('dateFormat', 'dd/MMMMM/yyyy'));
-select from_csv('26/October/2015', 'time Timestamp', map('timestampFormat', 'dd/MMMMM/yyyy'));
-select from_csv('26/October/2015', 'date Date', map('dateFormat', 'dd/MMMMM/yyyy'));
+select from_json('{"ts":"26/October/2015"}', 'ts Timestamp', map('timestampFormat', 'dd/MMMMM/yyyy'));


which change caused this?

It was a bug before that the parser used in from_json to parse the schema string sticks to the configs of the first created session in the current thread. Now the parser always use the active conf, and ANSI test fails here because time is a reserved keyword.

We probably need a separate PR for this bug.

I opened a new PR for this issue #30357

SparkQA · 2020-11-11T19:28:57Z

Test build #130940 has finished for PR 30299 at commit 6dbd559.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-11-16T07:41:08Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala

@@ -172,7 +172,7 @@ case class HiveTableScanExec(
        prunePartitions(hivePartitions)
      }
    } else {
-      if (sparkSession.sessionState.conf.metastorePartitionPruning &&


let's keep this unchanged for now. We may override def conf in SparkPlan later, to always get conf from the captured spark session.

SparkQA · 2020-11-16T08:05:02Z

Test build #131139 has finished for PR 30299 at commit bf1e56a.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds the following public classes (experimental):
trait SQLConfHelper
class Analyzer(override val catalogManager: CatalogManager)
class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logging
abstract class AbstractSqlParser extends ParserInterface with SQLConfHelper with Logging
abstract class Rule[TreeType <: TreeNode[_]] extends SQLConfHelper with Logging
class SparkPlanner(val session: SparkSession, val experimentalMethods: ExperimentalMethods)
class V2SessionCatalog(catalog: SessionCatalog)
class VariableSubstitution extends SQLConfHelper

SparkQA · 2020-11-16T08:14:27Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35742/

SparkQA · 2020-11-16T08:43:13Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35742/

SparkQA · 2020-11-16T08:53:00Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35744/

SparkQA · 2020-11-16T09:15:41Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35744/

… active SQLConf

SparkQA · 2020-11-16T12:53:31Z

Test build #131141 has finished for PR 30299 at commit 0cc0b42.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-11-16T13:37:04Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35762/

SparkQA · 2020-11-16T14:09:22Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35762/

cloud-fan · 2020-11-16T15:27:16Z

GA passed, merging to master, thanks!

SparkQA · 2020-11-16T18:01:42Z

Test build #131159 has finished for PR 30299 at commit 99619e3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

github-actions bot added SQL STRUCTURED STREAMING labels Nov 9, 2020

luluorta force-pushed the SPARK-33389 branch 2 times, most recently from 90181f9 to f33d4be Compare November 10, 2020 02:36

cloud-fan reviewed Nov 10, 2020

View reviewed changes

gatorsmile reviewed Nov 11, 2020

View reviewed changes

luluorta force-pushed the SPARK-33389 branch from 6dbd559 to bf1e56a Compare November 16, 2020 07:27

cloud-fan reviewed Nov 16, 2020

View reviewed changes

luluorta added 2 commits November 16, 2020 20:01

[SPARK-33389][SQL] Make internal classes of SparkSession always using…

6f143bf

… active SQLConf

revert unnecessary changes

99619e3

luluorta force-pushed the SPARK-33389 branch from 0cc0b42 to 99619e3 Compare November 16, 2020 12:46

cloud-fan approved these changes Nov 16, 2020

View reviewed changes

cloud-fan closed this in dfa6fb4 Nov 16, 2020

luluorta mentioned this pull request Nov 25, 2020

[SPARK-33141][SQL] Capture SQL configs when creating permanent views #30289

Closed

[SPARK-33389][SQL] Make internal classes of SparkSession always using active SQLConf #30299

[SPARK-33389][SQL] Make internal classes of SparkSession always using active SQLConf #30299

Conversation

luluorta commented Nov 9, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

SparkQA commented Nov 9, 2020

SparkQA commented Nov 10, 2020

SparkQA commented Nov 10, 2020

SparkQA commented Nov 10, 2020

SparkQA commented Nov 10, 2020

SparkQA commented Nov 10, 2020

SparkQA commented Nov 10, 2020

SparkQA commented Nov 10, 2020

SparkQA commented Nov 10, 2020

SparkQA commented Nov 10, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luluorta Nov 11, 2020 • edited Loading

Choose a reason for hiding this comment

cloud-fan Nov 11, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Nov 11, 2020

SparkQA commented Nov 11, 2020

SparkQA commented Nov 11, 2020

SparkQA commented Nov 11, 2020

SparkQA commented Nov 11, 2020

SparkQA commented Nov 11, 2020

SparkQA commented Nov 11, 2020

SparkQA commented Nov 11, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luluorta Nov 16, 2020 • edited Loading

Choose a reason for hiding this comment

SparkQA commented Nov 11, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Nov 16, 2020

SparkQA commented Nov 16, 2020

SparkQA commented Nov 16, 2020

SparkQA commented Nov 16, 2020

SparkQA commented Nov 16, 2020

SparkQA commented Nov 16, 2020

SparkQA commented Nov 16, 2020

SparkQA commented Nov 16, 2020

cloud-fan commented Nov 16, 2020

SparkQA commented Nov 16, 2020

luluorta Nov 11, 2020 •

edited

Loading

cloud-fan Nov 11, 2020 •

edited

Loading

luluorta Nov 16, 2020 •

edited

Loading