-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-26653][SQL] Use Proleptic Gregorian calendar in parsing JDBC lower/upper bounds #23597
Conversation
@maropu Please, take a look at this PR. |
Test build #101449 has finished for PR 23597 at commit
|
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala
Outdated
Show resolved
Hide resolved
LGTM except for one minor comment. cc: @gatorsmile |
Test build #101487 has finished for PR 23597 at commit
|
case _: NumericType => value.toLong | ||
case DateType => parse(stringToDate).toLong | ||
case TimestampType => | ||
parse(stringToTimestamp(_, getTimeZone(SQLConf.get.sessionLocalTimeZone))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decide to adjust the timestamp constants based on the user-specified local timezone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @cloud-fan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we call Timestamp.valueOf(value)
, which uses JVM local timezone. It seems to me that using Spark session timezone is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we have to do it. This is a followup of #23391 , which changed how we turn the timestamp boundaries to string. Here we change hoow we turn string to timestamp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a behavior change. We need to clearly document which inputs start respecting our Spark local session timezone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we mention something like .. all string -> timestamp will respect Session timezone, JDBC lower/upper bounds, blabla, ..., and java 8 time will be consistently used across code base .. after the sub-tasks in the umbrella are resolved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we should improve the migration guide. Switching to Proleptic Gregorian calendar is a behavior change to many places, it's better we can list all of them in the migration guide.
case _: NumericType => value.toLong | ||
case DateType => DateTimeUtils.fromJavaDate(Date.valueOf(value)).toLong | ||
case TimestampType => DateTimeUtils.fromJavaTimestamp(Timestamp.valueOf(value)) | ||
private def toInternalBoundValue(value: String, columnType: DataType): Long = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should pass in the timezone id, just like what we did for toBoundValueInWhereClause
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Test build #101524 has finished for PR 23597 at commit
|
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala
Show resolved
Hide resolved
Test build #101545 has finished for PR 23597 at commit
|
LGTM, let's resole the conflict. |
Test build #101579 has finished for PR 23597 at commit
|
thanks, merging to master! |
…ower/upper bounds ## What changes were proposed in this pull request? In the PR, I propose using of the `stringToDate` and `stringToTimestamp` methods in parsing JDBC lower/upper bounds of the partition column if it has `DateType` or `TimestampType`. Since those methods have been ported on Proleptic Gregorian calendar by apache#23512, the PR switches parsing of JDBC bounds of the partition column on the calendar as well. ## How was this patch tested? This was tested by `JDBCSuite`. Closes apache#23597 from MaxGekk/jdbc-parse-timestamp-bounds. Lead-authored-by: Maxim Gekk <maxim.gekk@databricks.com> Co-authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
In the PR, I propose using of the
stringToDate
andstringToTimestamp
methods in parsing JDBC lower/upper bounds of the partition column if it hasDateType
orTimestampType
. Since those methods have been ported on Proleptic Gregorian calendar by #23512, the PR switches parsing of JDBC bounds of the partition column on the calendar as well.How was this patch tested?
This was tested by
JDBCSuite
.