[SPARK-26653][SQL] Use Proleptic Gregorian calendar in parsing JDBC lower/upper bounds #23597

MaxGekk · 2019-01-20T17:38:29Z

What changes were proposed in this pull request?

In the PR, I propose using of the stringToDate and stringToTimestamp methods in parsing JDBC lower/upper bounds of the partition column if it has DateType or TimestampType. Since those methods have been ported on Proleptic Gregorian calendar by #23512, the PR switches parsing of JDBC bounds of the partition column on the calendar as well.

How was this patch tested?

This was tested by JDBCSuite.

…p-bounds

MaxGekk · 2019-01-20T17:39:00Z

@maropu Please, take a look at this PR.

SparkQA · 2019-01-20T21:35:33Z

Test build #101449 has finished for PR 23597 at commit 0b61076.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala

maropu · 2019-01-21T12:25:33Z

LGTM except for one minor comment. cc: @gatorsmile

SparkQA · 2019-01-21T21:12:07Z

Test build #101487 has finished for PR 23597 at commit 8bb4f3a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2019-01-22T01:37:31Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala

+      case _: NumericType => value.toLong
+      case DateType => parse(stringToDate).toLong
+      case TimestampType =>
+        parse(stringToTimestamp(_, getTimeZone(SQLConf.get.sessionLocalTimeZone)))


We decide to adjust the timestamp constants based on the user-specified local timezone?

cc @cloud-fan

Previously we call Timestamp.valueOf(value), which uses JVM local timezone. It seems to me that using Spark session timezone is better.

Actually we have to do it. This is a followup of #23391 , which changed how we turn the timestamp boundaries to string. Here we change hoow we turn string to timestamp.

This is a behavior change. We need to clearly document which inputs start respecting our Spark local session timezone?

How about we mention something like .. all string -> timestamp will respect Session timezone, JDBC lower/upper bounds, blabla, ..., and java 8 time will be consistently used across code base .. after the sub-tasks in the umbrella are resolved?

I agree that we should improve the migration guide. Switching to Proleptic Gregorian calendar is a behavior change to many places, it's better we can list all of them in the migration guide.

cloud-fan · 2019-01-22T02:39:48Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala

-    case _: NumericType => value.toLong
-    case DateType => DateTimeUtils.fromJavaDate(Date.valueOf(value)).toLong
-    case TimestampType => DateTimeUtils.fromJavaTimestamp(Timestamp.valueOf(value))
+  private def toInternalBoundValue(value: String, columnType: DataType): Long = {


we should pass in the timezone id, just like what we did for toBoundValueInWhereClause

SparkQA · 2019-01-22T14:02:38Z

Test build #101524 has finished for PR 23597 at commit a0b23ed.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala

docs/sql-migration-guide-upgrade.md

SparkQA · 2019-01-22T19:52:11Z

Test build #101545 has finished for PR 23597 at commit 4dc4a2a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-01-23T03:15:11Z

LGTM, let's resole the conflict.

SparkQA · 2019-01-23T12:07:00Z

Test build #101579 has finished for PR 23597 at commit af20442.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-01-23T12:22:51Z

thanks, merging to master!

…ower/upper bounds ## What changes were proposed in this pull request? In the PR, I propose using of the `stringToDate` and `stringToTimestamp` methods in parsing JDBC lower/upper bounds of the partition column if it has `DateType` or `TimestampType`. Since those methods have been ported on Proleptic Gregorian calendar by apache#23512, the PR switches parsing of JDBC bounds of the partition column on the calendar as well. ## How was this patch tested? This was tested by `JDBCSuite`. Closes apache#23597 from MaxGekk/jdbc-parse-timestamp-bounds. Lead-authored-by: Maxim Gekk <maxim.gekk@databricks.com> Co-authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

MaxGekk added 4 commits January 20, 2019 17:42

Using stringToDate/stringToTimestamp

f025ae1

Add a test

6171e9a

Merge remote-tracking branch 'origin/master' into jdbc-parse-timestam…

0e3691e

…p-bounds

Update test

0b61076

maropu reviewed Jan 21, 2019

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala Outdated Show resolved Hide resolved

Run the test in different time zones

8bb4f3a

gatorsmile reviewed Jan 22, 2019

View reviewed changes

cloud-fan reviewed Jan 22, 2019

View reviewed changes

MaxGekk added 2 commits January 22, 2019 10:48

Pass time zone as a parameter

7ba3cbb

Upgrade the migration guide

a0b23ed

cloud-fan approved these changes Jan 22, 2019

View reviewed changes