Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33521][SQL] Universal type conversion in resolving V2 partition specs #30474

Closed
wants to merge 8 commits into from

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Nov 23, 2020

What changes were proposed in this pull request?

In the PR, I propose to changes the resolver of partition specs used in V2 ALTER TABLE .. ADD/DROP PARTITION (at the moment), and re-use CAST in conversion partition values to desired types according to the partition schema.

Why are the changes needed?

Currently, the resolver of V2 partition specs supports just a few types:

, and fails on other types like date/timestamp.

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

By running AlterTablePartitionV2SQLSuite

@github-actions github-actions bot added the SQL label Nov 23, 2020
@SparkQA
Copy link

SparkQA commented Nov 23, 2020

Test build #131572 has started for PR 30474 at commit d0108e0.

@SparkQA
Copy link

SparkQA commented Nov 23, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36173/

@SparkQA
Copy link

SparkQA commented Nov 23, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36173/

}
}
val raw = normalizedSpec.get(part.name).orNull
Cast(Literal.create(raw, StringType), part.dataType, Some(conf.sessionLocalTimeZone)).eval()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this matched to the V1 partitioning type coercion? I remember it has a bit different rules, see PartitioningUtils.inferPartitionColumnValue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have unified tests for V1 and V2 ALTER TABLE .. ADD/DROP PARTITION at the moment. I plan to do that soon. As soon as we have such tests we will see the differences and fix them.

For now, I just try to make implementation simpler - cast partition values according to the partition schema.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think V1 does the same, see

/**
* Given the partition schema, returns a row with that schema holding the partition values.
*/
def toRow(partitionSchema: StructType, defaultTimeZondId: String): InternalRow = {
val caseInsensitiveProperties = CaseInsensitiveMap(storage.properties)
val timeZoneId = caseInsensitiveProperties.getOrElse(
DateTimeUtils.TIMEZONE_OPTION, defaultTimeZondId)
InternalRow.fromSeq(partitionSchema.map { field =>
val partValue = if (spec(field.name) == ExternalCatalogUtils.DEFAULT_PARTITION_NAME) {
null
} else {
spec(field.name)
}
Cast(Literal(partValue), field.dataType, Option(timeZoneId)).eval()
})
}
}

Let me extract the code to PartitioningUtils, and re-use it in V2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotya

Copy link
Member Author

@MaxGekk MaxGekk Nov 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reused the code from DSv1 #30482 and fixed an issue. @HyukjinKwon Please, review it.

@SparkQA
Copy link

SparkQA commented Nov 24, 2020

Test build #131579 has finished for PR 30474 at commit d0108e0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in a6555ee Nov 24, 2020
@MaxGekk MaxGekk deleted the dsv2-partition-value-types branch February 19, 2021 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants