Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18939][SQL] Timezone support in partition values. #17053

Closed
wants to merge 7 commits into from

Conversation

ueshin
Copy link
Member

@ueshin ueshin commented Feb 24, 2017

What changes were proposed in this pull request?

This is a follow-up pr of #16308 and #16750.

This pr enables timezone support in partition values.

We should use timeZone option introduced at #16750 to parse/format partition values of the TimestampType.

For example, if you have timestamp "2016-01-01 00:00:00" in GMT which will be used for partition values, the values written by the default timezone option, which is "GMT" because the session local timezone is "GMT" here, are:

scala> spark.conf.set("spark.sql.session.timeZone", "GMT")

scala> val df = Seq((1, new java.sql.Timestamp(1451606400000L))).toDF("i", "ts")
df: org.apache.spark.sql.DataFrame = [i: int, ts: timestamp]

scala> df.show()
+---+-------------------+
|  i|                 ts|
+---+-------------------+
|  1|2016-01-01 00:00:00|
+---+-------------------+

scala> df.write.partitionBy("ts").save("/path/to/gmtpartition")
$ ls /path/to/gmtpartition/
_SUCCESS			ts=2016-01-01 00%3A00%3A00

whereas setting the option to "PST", they are:

scala> df.write.option("timeZone", "PST").partitionBy("ts").save("/path/to/pstpartition")
$ ls /path/to/pstpartition/
_SUCCESS			ts=2015-12-31 16%3A00%3A00

We can properly read the partition values if the session local timezone and the timezone of the partition values are the same:

scala> spark.read.load("/path/to/gmtpartition").show()
+---+-------------------+
|  i|                 ts|
+---+-------------------+
|  1|2016-01-01 00:00:00|
+---+-------------------+

And even if the timezones are different, we can properly read the values with setting corrent timezone option:

// wrong result
scala> spark.read.load("/path/to/pstpartition").show()
+---+-------------------+
|  i|                 ts|
+---+-------------------+
|  1|2015-12-31 16:00:00|
+---+-------------------+

// correct result
scala> spark.read.option("timeZone", "PST").load("/path/to/pstpartition").show()
+---+-------------------+
|  i|                 ts|
+---+-------------------+
|  1|2016-01-01 00:00:00|
+---+-------------------+

How was this patch tested?

Existing tests and added some tests.

@SparkQA
Copy link

SparkQA commented Feb 24, 2017

Test build #73406 has started for PR 17053 at commit 49da287.

@@ -251,7 +251,8 @@ abstract class ExternalCatalog {
def listPartitionsByFilter(
db: String,
table: String,
predicates: Seq[Expression]): Seq[CatalogTablePartition]
predicates: Seq[Expression],
defaultTimeZoneId: String): Seq[CatalogTablePartition]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to document what a timezone id is here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I'll add it.

@SparkQA
Copy link

SparkQA commented Feb 24, 2017

Test build #73409 has started for PR 17053 at commit c563a9a.

@ueshin
Copy link
Member Author

ueshin commented Feb 24, 2017

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Feb 24, 2017

Test build #73413 has finished for PR 17053 at commit c563a9a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ueshin
Copy link
Member Author

ueshin commented Feb 24, 2017

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Feb 24, 2017

Test build #73418 has finished for PR 17053 at commit c563a9a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 1, 2017

Test build #73635 has finished for PR 17053 at commit f7a146a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@cloud-fan
Copy link
Contributor

LGTM, pending tests

@SparkQA
Copy link

SparkQA commented Mar 4, 2017

Test build #73862 has finished for PR 17053 at commit f7a146a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants