-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-33389][SQL] Make internal classes of SparkSession always using active SQLConf #30299
Conversation
Test build #130790 has finished for PR 30299 at commit
|
90181f9
to
f33d4be
Compare
Kubernetes integration test starting |
Kubernetes integration test status success |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #130821 has finished for PR 30299 at commit
|
Test build #130824 has finished for PR 30299 at commit
|
Test build #130857 has finished for PR 30299 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
/** | ||
* Trait for shared SQLConf. | ||
*/ | ||
trait HasConf { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is weird. How about SQLConfHelper
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, SQLConfHelper
is better.
import org.apache.spark.sql.internal.SQLConf | ||
|
||
/** | ||
* Trait for shared SQLConf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trait for getting the active SQLConf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -42,7 +42,7 @@ import org.apache.spark.sql.catalyst.trees.TreeNodeRef | |||
import org.apache.spark.sql.catalyst.util.toPrettySQL | |||
import org.apache.spark.sql.connector.catalog._ | |||
import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._ | |||
import org.apache.spark.sql.connector.catalog.TableChange.{AddColumn, After, ColumnChange, ColumnPosition, DeleteColumn, RenameColumn, UpdateColumnComment, UpdateColumnNullability, UpdateColumnPosition, UpdateColumnType} | |||
import org.apache.spark.sql.connector.catalog.TableChange.{First => _, _} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
import SessionCatalog._ | ||
import CatalogTypes.TablePartitionSpec | ||
|
||
// For testing only. | ||
def this( | ||
externalCatalog: ExternalCatalog, | ||
functionRegistry: FunctionRegistry, | ||
conf: SQLConf) = { | ||
staticConf: SQLConf) = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous name conf
is OK. SQLConf
contains both static and runtime configs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
val expected = caseInsensitiveAnalyzer.execute( | ||
testRelation.where('a > 2 && ('b > 3 || 'b < 5))) | ||
comparePlans(actual, expected) | ||
withSQLConf(SQLConf.CASE_SENSITIVE.key -> "false") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the default value. Seems we can remove withSQLConf
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
val expected = caseInsensitiveAnalyzer.execute( | ||
testRelation.where('a > 2 || ('b > 3 && 'b < 5))) | ||
comparePlans(actual, expected) | ||
withSQLConf(SQLConf.CASE_SENSITIVE.key -> "false") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
@@ -149,7 +149,7 @@ select to_timestamp('2019-10-06 A', 'yyyy-MM-dd GGGGG'); | |||
select to_timestamp('22 05 2020 Friday', 'dd MM yyyy EEEEEE'); | |||
select to_timestamp('22 05 2020 Friday', 'dd MM yyyy EEEEE'); | |||
select unix_timestamp('22 05 2020 Friday', 'dd MM yyyy EEEEE'); | |||
select from_json('{"time":"26/October/2015"}', 'time Timestamp', map('timestampFormat', 'dd/MMMMM/yyyy')); | |||
select from_json('{"timestamp":"26/October/2015"}', 'timestamp Timestamp', map('timestampFormat', 'dd/MMMMM/yyyy')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we change this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this PR, dynamically set "spark.sql.ansi.enabled" actually takes effect in parsing phase. This query will fails parsing cause time
is a reserved key word of SQL standard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timestamp
is also reserved in ANSI standard. How about ts
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -60,8 +59,6 @@ case class HiveTableScanExec( | |||
require(partitionPruningPred.isEmpty || relation.isPartitioned, | |||
"Partition pruning predicates only supported for partitioned tables.") | |||
|
|||
override def conf: SQLConf = sparkSession.sessionState.conf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually makes sense, to make sure the conf matches the spark session. How about we move this override to SparkPlan
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
Test build #130913 has finished for PR 30299 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #130929 has finished for PR 30299 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
select from_json('{"date":"26/October/2015"}', 'date Date', map('dateFormat', 'dd/MMMMM/yyyy')); | ||
select from_csv('26/October/2015', 'time Timestamp', map('timestampFormat', 'dd/MMMMM/yyyy')); | ||
select from_csv('26/October/2015', 'date Date', map('dateFormat', 'dd/MMMMM/yyyy')); | ||
select from_json('{"ts":"26/October/2015"}', 'ts Timestamp', map('timestampFormat', 'dd/MMMMM/yyyy')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which change caused this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a bug before that the parser used in from_json
to parse the schema string sticks to the configs of the first created session in the current thread. Now the parser always use the active conf, and ANSI test fails here because time
is a reserved keyword.
We probably need a separate PR for this bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened a new PR for this issue #30357
Test build #130940 has finished for PR 30299 at commit
|
6dbd559
to
bf1e56a
Compare
@@ -172,7 +172,7 @@ case class HiveTableScanExec( | |||
prunePartitions(hivePartitions) | |||
} | |||
} else { | |||
if (sparkSession.sessionState.conf.metastorePartitionPruning && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's keep this unchanged for now. We may override def conf
in SparkPlan
later, to always get conf from the captured spark session.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted
Test build #131139 has finished for PR 30299 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Kubernetes integration test starting |
Kubernetes integration test status success |
0cc0b42
to
99619e3
Compare
Test build #131141 has finished for PR 30299 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
GA passed, merging to master, thanks! |
Test build #131159 has finished for PR 30299 at commit
|
What changes were proposed in this pull request?
This PR makes internal classes of SparkSession always using active SQLConf. We should remove all
conf: SQLConf
s from ctor-parameters of this classes (Analyzer
,SparkPlanner
,SessionCatalog
,CatalogManager
SparkSqlParser
and etc.) and useSQLConf.get
instead.Why are the changes needed?
Code refine.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing test