[SPARK-33521][SQL] Universal type conversion in resolving V2 partition specs #30474

MaxGekk · 2020-11-23T19:58:25Z

What changes were proposed in this pull request?

In the PR, I propose to changes the resolver of partition specs used in V2 ALTER TABLE .. ADD/DROP PARTITION (at the moment), and re-use CAST in conversion partition values to desired types according to the partition schema.

Why are the changes needed?

Currently, the resolver of V2 partition specs supports just a few types:

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolvePartitionSpec.scala

Line 72 in 23e9920

// TODO: Support other datatypes, such as DateType

, and fails on other types like date/timestamp.

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

By running AlterTablePartitionV2SQLSuite

SparkQA · 2020-11-23T20:02:40Z

Test build #131572 has started for PR 30474 at commit d0108e0.

SparkQA · 2020-11-23T20:43:25Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36173/

SparkQA · 2020-11-23T21:06:42Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36173/

HyukjinKwon · 2020-11-24T01:23:43Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolvePartitionSpec.scala

-        }
-      }
+      val raw = normalizedSpec.get(part.name).orNull
+      Cast(Literal.create(raw, StringType), part.dataType, Some(conf.sessionLocalTimeZone)).eval()


Is this matched to the V1 partitioning type coercion? I remember it has a bit different rules, see PartitioningUtils.inferPartitionColumnValue.

We don't have unified tests for V1 and V2 ALTER TABLE .. ADD/DROP PARTITION at the moment. I plan to do that soon. As soon as we have such tests we will see the differences and fix them.

For now, I just try to make implementation simpler - cast partition values according to the partition schema.

I think V1 does the same, see

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala

Lines 149 to 165 in dfa6fb4

/**

* Given the partition schema, returns a row with that schema holding the partition values.

*/

def toRow(partitionSchema: StructType, defaultTimeZondId: String): InternalRow = {

val caseInsensitiveProperties = CaseInsensitiveMap(storage.properties)

val timeZoneId = caseInsensitiveProperties.getOrElse(

DateTimeUtils.TIMEZONE_OPTION, defaultTimeZondId)

InternalRow.fromSeq(partitionSchema.map { field =>

val partValue = if (spec(field.name) == ExternalCatalogUtils.DEFAULT_PARTITION_NAME) {

null

} else {

spec(field.name)

}

Cast(Literal(partValue), field.dataType, Option(timeZoneId)).eval()

})

}

}

Let me extract the code to PartitioningUtils, and re-use it in V2.

I reused the code from DSv1 #30482 and fixed an issue. @HyukjinKwon Please, review it.

SparkQA · 2020-11-24T05:59:45Z

Test build #131579 has finished for PR 30474 at commit d0108e0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-11-24T08:04:19Z

thanks, merging to master!

MaxGekk added 8 commits November 23, 2020 20:31

Add a test

1d1b76b

Support existing types in the test

10f31ee

Universal type converter

1b91bbb

Check boolean

e4bd364

Check date and timestamps

8e5880b

Rename the test

d9901a8

Specify raw type explicitly as StringType

2118d0a

Add JIRA to the test title

d0108e0

github-actions bot added the SQL label Nov 23, 2020

HyukjinKwon reviewed Nov 24, 2020

View reviewed changes

cloud-fan closed this in a6555ee Nov 24, 2020

MaxGekk deleted the dsv2-partition-value-types branch February 19, 2021 15:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-33521][SQL] Universal type conversion in resolving V2 partition specs #30474

[SPARK-33521][SQL] Universal type conversion in resolving V2 partition specs #30474

MaxGekk commented Nov 23, 2020

SparkQA commented Nov 23, 2020

SparkQA commented Nov 23, 2020

SparkQA commented Nov 23, 2020

HyukjinKwon Nov 24, 2020

MaxGekk Nov 24, 2020

MaxGekk Nov 24, 2020

HyukjinKwon Nov 24, 2020

MaxGekk Nov 24, 2020 •

edited

Loading

SparkQA commented Nov 24, 2020

cloud-fan commented Nov 24, 2020

	/**
	* Given the partition schema, returns a row with that schema holding the partition values.
	*/
	def toRow(partitionSchema: StructType, defaultTimeZondId: String): InternalRow = {
	val caseInsensitiveProperties = CaseInsensitiveMap(storage.properties)
	val timeZoneId = caseInsensitiveProperties.getOrElse(
	DateTimeUtils.TIMEZONE_OPTION, defaultTimeZondId)
	InternalRow.fromSeq(partitionSchema.map { field =>
	val partValue = if (spec(field.name) == ExternalCatalogUtils.DEFAULT_PARTITION_NAME) {
	null
	} else {
	spec(field.name)
	}
	Cast(Literal(partValue), field.dataType, Option(timeZoneId)).eval()
	})
	}
	}

[SPARK-33521][SQL] Universal type conversion in resolving V2 partition specs #30474

[SPARK-33521][SQL] Universal type conversion in resolving V2 partition specs #30474

Conversation

MaxGekk commented Nov 23, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

SparkQA commented Nov 23, 2020

SparkQA commented Nov 23, 2020

SparkQA commented Nov 23, 2020

HyukjinKwon Nov 24, 2020

Choose a reason for hiding this comment

MaxGekk Nov 24, 2020

Choose a reason for hiding this comment

MaxGekk Nov 24, 2020

Choose a reason for hiding this comment

HyukjinKwon Nov 24, 2020

Choose a reason for hiding this comment

MaxGekk Nov 24, 2020 • edited Loading

Choose a reason for hiding this comment

SparkQA commented Nov 24, 2020

cloud-fan commented Nov 24, 2020

MaxGekk Nov 24, 2020 •

edited

Loading