-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-31901][SQL] Use the session time zone in legacy date formatters #28709
Conversation
Test build #123449 has finished for PR 28709 at commit
|
…-formatter-time-zone # Conflicts: # sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateFormatter.scala # sql/catalyst/src/test/scala/org/apache/spark/sql/util/DateFormatterSuite.scala
assert(LocalDate.ofEpochDay(formatter.parse("1000-01-01")) === LocalDate.of(1000, 1, 1)) | ||
assert(formatter.format(LocalDate.of(1000, 1, 1)) === "1000-01-01") | ||
assert(formatter.format(localDateToDays(LocalDate.of(1000, 1, 1))) === "1000-01-01") | ||
val cal = new Calendar.Builder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The time zone is embedded into java.sql.Date
, and it is the global default JVM time zone. To set tested time zone, I have to construct Date via the calendar.
@cloud-fan @HyukjinKwon The changes can lead to inconsistent behaviour when JVM and session time zones are different: $ export TZ="Europe/Moscow"
$ ./bin/spark-sql -S
spark-sql> set spark.sql.session.timeZone=America/Los_Angeles;
spark.sql.session.timeZone America/Los_Angeles
spark-sql> select date '2020-06-03';
2020-06-02
spark-sql> select make_date(2020, 6, 3);
2020-06-02 This happens because |
Test build #123473 has finished for PR 28709 at commit
|
…-formatter-time-zone
Test build #123506 has finished for PR 28709 at commit
|
…-formatter-time-zone # Conflicts: # sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateFormatterSuite.scala
Test build #123575 has finished for PR 28709 at commit
|
…mJavaDate and legacy date formatters ### What changes were proposed in this pull request? Update comments for `DateTimeUtils`.`toJavaDate` and `fromJavaDate`, and for the legacy date formatters `LegacySimpleDateFormatter` and `LegacyFastDateFormatter` regarding to the default JVM time zone. The comments say that the default JVM time zone is used intentionally for backward compatibility with Spark 2.4 and earlier versions. Closes #28709 ### Why are the changes needed? To document current behaviour of related methods in `DateTimeUtils` and the legacy date formatters. For example, correctness of `HiveResult.hiveResultString` and `toHiveString` is directly related to the same time zone used by `toJavaDate` and `LegacyFastDateFormatter`. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the Scala style checker `./dev/scalastyle` Closes #28767 from MaxGekk/doc-legacy-formatters. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit de91915) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
In the PR, I propose to use the specified time zone ID via
DateFormatter.apply()
orDateFormatter.getLegacyFormatter()
, and pass it toSimpleDateFormat
/FastDateFormat
.Why are the changes needed?
Currently, the date formatter can delegate formatting/parsing to the legacy formatters in some cases but the formatters ignore the time zone specified via the
zoneId
parameter, and use another time zone returned byTimeZone.getDefault()
. Ignoring of input parameters is bad practice.Does this PR introduce any user-facing change?
No, if the default JVM time zone and the session time zone controlled by the SQL config
spark.sql.session.timeZone
are the same. That's the default case. If an user sets different value to the session and JVM time zones, he/she can observe different results.How was this patch tested?
Modified tests in
DateFormatterSuite
to check that legacy date formatters don't depend on the default JVM time zone and respect to Spark's session time zone.