Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33389][SQL] Make internal classes of SparkSession always using active SQLConf #30299

Closed
wants to merge 2 commits into from

Conversation

luluorta
Copy link
Contributor

@luluorta luluorta commented Nov 9, 2020

What changes were proposed in this pull request?

This PR makes internal classes of SparkSession always using active SQLConf. We should remove all conf: SQLConfs from ctor-parameters of this classes (Analyzer, SparkPlanner, SessionCatalog, CatalogManager SparkSqlParser and etc.) and use SQLConf.get instead.

Why are the changes needed?

Code refine.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing test

@SparkQA
Copy link

SparkQA commented Nov 9, 2020

Test build #130790 has finished for PR 30299 at commit ad9d6d2.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait HasConf
  • class Analyzer(override val catalogManager: CatalogManager)
  • class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging with HasConf
  • abstract class AbstractSqlParser extends ParserInterface with Logging with HasConf
  • class CatalystSqlParser extends AbstractSqlParser
  • abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanType] with HasConf
  • abstract class Rule[TreeType <: TreeNode[_]] extends HasConf with Logging
  • class SparkPlanner(val session: SparkSession, val experimentalMethods: ExperimentalMethods)
  • class SparkSqlParser extends AbstractSqlParser
  • class SparkSqlAstBuilder extends AstBuilder
  • class V2SessionCatalog(catalog: SessionCatalog)
  • class VariableSubstitution extends HasConf

@luluorta luluorta force-pushed the SPARK-33389 branch 2 times, most recently from 90181f9 to f33d4be Compare November 10, 2020 02:36
@SparkQA
Copy link

SparkQA commented Nov 10, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35430/

@SparkQA
Copy link

SparkQA commented Nov 10, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35430/

@SparkQA
Copy link

SparkQA commented Nov 10, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35432/

@SparkQA
Copy link

SparkQA commented Nov 10, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35432/

@SparkQA
Copy link

SparkQA commented Nov 10, 2020

Test build #130821 has finished for PR 30299 at commit 90181f9.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait HasConf
  • class Analyzer(override val catalogManager: CatalogManager)
  • class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging with HasConf
  • abstract class AbstractSqlParser extends ParserInterface with Logging with HasConf
  • class CatalystSqlParser extends AbstractSqlParser
  • abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanType] with HasConf
  • abstract class Rule[TreeType <: TreeNode[_]] extends HasConf with Logging
  • class SparkPlanner(val session: SparkSession, val experimentalMethods: ExperimentalMethods)
  • class SparkSqlParser extends AbstractSqlParser
  • class SparkSqlAstBuilder extends AstBuilder
  • class V2SessionCatalog(catalog: SessionCatalog)
  • class VariableSubstitution extends HasConf

@SparkQA
Copy link

SparkQA commented Nov 10, 2020

Test build #130824 has finished for PR 30299 at commit f33d4be.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait HasConf
  • class Analyzer(override val catalogManager: CatalogManager)
  • class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging with HasConf
  • abstract class AbstractSqlParser extends ParserInterface with Logging with HasConf
  • class CatalystSqlParser extends AbstractSqlParser
  • abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanType] with HasConf
  • abstract class Rule[TreeType <: TreeNode[_]] extends HasConf with Logging
  • class SparkPlanner(val session: SparkSession, val experimentalMethods: ExperimentalMethods)
  • class SparkSqlParser extends AbstractSqlParser
  • class SparkSqlAstBuilder extends AstBuilder
  • class V2SessionCatalog(catalog: SessionCatalog)
  • class VariableSubstitution extends HasConf

@SparkQA
Copy link

SparkQA commented Nov 10, 2020

Test build #130857 has finished for PR 30299 at commit b5366df.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 10, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35464/

@SparkQA
Copy link

SparkQA commented Nov 10, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35464/

/**
* Trait for shared SQLConf.
*/
trait HasConf {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is weird. How about SQLConfHelper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, SQLConfHelper is better.

import org.apache.spark.sql.internal.SQLConf

/**
* Trait for shared SQLConf.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trait for getting the active SQLConf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -42,7 +42,7 @@ import org.apache.spark.sql.catalyst.trees.TreeNodeRef
import org.apache.spark.sql.catalyst.util.toPrettySQL
import org.apache.spark.sql.connector.catalog._
import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._
import org.apache.spark.sql.connector.catalog.TableChange.{AddColumn, After, ColumnChange, ColumnPosition, DeleteColumn, RenameColumn, UpdateColumnComment, UpdateColumnNullability, UpdateColumnPosition, UpdateColumnType}
import org.apache.spark.sql.connector.catalog.TableChange.{First => _, _}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

import SessionCatalog._
import CatalogTypes.TablePartitionSpec

// For testing only.
def this(
externalCatalog: ExternalCatalog,
functionRegistry: FunctionRegistry,
conf: SQLConf) = {
staticConf: SQLConf) = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous name conf is OK. SQLConf contains both static and runtime configs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

val expected = caseInsensitiveAnalyzer.execute(
testRelation.where('a > 2 && ('b > 3 || 'b < 5)))
comparePlans(actual, expected)
withSQLConf(SQLConf.CASE_SENSITIVE.key -> "false") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the default value. Seems we can remove withSQLConf?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

val expected = caseInsensitiveAnalyzer.execute(
testRelation.where('a > 2 || ('b > 3 && 'b < 5)))
comparePlans(actual, expected)
withSQLConf(SQLConf.CASE_SENSITIVE.key -> "false") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

@@ -149,7 +149,7 @@ select to_timestamp('2019-10-06 A', 'yyyy-MM-dd GGGGG');
select to_timestamp('22 05 2020 Friday', 'dd MM yyyy EEEEEE');
select to_timestamp('22 05 2020 Friday', 'dd MM yyyy EEEEE');
select unix_timestamp('22 05 2020 Friday', 'dd MM yyyy EEEEE');
select from_json('{"time":"26/October/2015"}', 'time Timestamp', map('timestampFormat', 'dd/MMMMM/yyyy'));
select from_json('{"timestamp":"26/October/2015"}', 'timestamp Timestamp', map('timestampFormat', 'dd/MMMMM/yyyy'));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we change this test?

Copy link
Contributor Author

@luluorta luluorta Nov 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this PR, dynamically set "spark.sql.ansi.enabled" actually takes effect in parsing phase. This query will fails parsing cause time is a reserved key word of SQL standard.

Copy link
Contributor

@cloud-fan cloud-fan Nov 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timestamp is also reserved in ANSI standard. How about ts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -60,8 +59,6 @@ case class HiveTableScanExec(
require(partitionPruningPred.isEmpty || relation.isPartitioned,
"Partition pruning predicates only supported for partitioned tables.")

override def conf: SQLConf = sparkSession.sessionState.conf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually makes sense, to make sure the conf matches the spark session. How about we move this override to SparkPlan?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

@SparkQA
Copy link

SparkQA commented Nov 11, 2020

Test build #130913 has finished for PR 30299 at commit 7047924.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 11, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35519/

@SparkQA
Copy link

SparkQA commented Nov 11, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35519/

@SparkQA
Copy link

SparkQA commented Nov 11, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35534/

@SparkQA
Copy link

SparkQA commented Nov 11, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35534/

@SparkQA
Copy link

SparkQA commented Nov 11, 2020

Test build #130929 has finished for PR 30299 at commit ce97a61.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 11, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35545/

@SparkQA
Copy link

SparkQA commented Nov 11, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35545/

select from_json('{"date":"26/October/2015"}', 'date Date', map('dateFormat', 'dd/MMMMM/yyyy'));
select from_csv('26/October/2015', 'time Timestamp', map('timestampFormat', 'dd/MMMMM/yyyy'));
select from_csv('26/October/2015', 'date Date', map('dateFormat', 'dd/MMMMM/yyyy'));
select from_json('{"ts":"26/October/2015"}', 'ts Timestamp', map('timestampFormat', 'dd/MMMMM/yyyy'));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which change caused this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a bug before that the parser used in from_json to parse the schema string sticks to the configs of the first created session in the current thread. Now the parser always use the active conf, and ANSI test fails here because time is a reserved keyword.

We probably need a separate PR for this bug.

Copy link
Contributor Author

@luluorta luluorta Nov 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened a new PR for this issue #30357

@SparkQA
Copy link

SparkQA commented Nov 11, 2020

Test build #130940 has finished for PR 30299 at commit 6dbd559.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -172,7 +172,7 @@ case class HiveTableScanExec(
prunePartitions(hivePartitions)
}
} else {
if (sparkSession.sessionState.conf.metastorePartitionPruning &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's keep this unchanged for now. We may override def conf in SparkPlan later, to always get conf from the captured spark session.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted

@SparkQA
Copy link

SparkQA commented Nov 16, 2020

Test build #131139 has finished for PR 30299 at commit bf1e56a.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait SQLConfHelper
  • class Analyzer(override val catalogManager: CatalogManager)
  • class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logging
  • abstract class AbstractSqlParser extends ParserInterface with SQLConfHelper with Logging
  • abstract class Rule[TreeType <: TreeNode[_]] extends SQLConfHelper with Logging
  • class SparkPlanner(val session: SparkSession, val experimentalMethods: ExperimentalMethods)
  • class V2SessionCatalog(catalog: SessionCatalog)
  • class VariableSubstitution extends SQLConfHelper

@SparkQA
Copy link

SparkQA commented Nov 16, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35742/

@SparkQA
Copy link

SparkQA commented Nov 16, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35742/

@SparkQA
Copy link

SparkQA commented Nov 16, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35744/

@SparkQA
Copy link

SparkQA commented Nov 16, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35744/

@SparkQA
Copy link

SparkQA commented Nov 16, 2020

Test build #131141 has finished for PR 30299 at commit 0cc0b42.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 16, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35762/

@SparkQA
Copy link

SparkQA commented Nov 16, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35762/

@cloud-fan
Copy link
Contributor

GA passed, merging to master, thanks!

@cloud-fan cloud-fan closed this in dfa6fb4 Nov 16, 2020
@SparkQA
Copy link

SparkQA commented Nov 16, 2020

Test build #131159 has finished for PR 30299 at commit 99619e3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants