Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-31901][SQL] Use the session time zone in legacy date formatters #28709

Closed

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Jun 2, 2020

What changes were proposed in this pull request?

In the PR, I propose to use the specified time zone ID via DateFormatter.apply() or DateFormatter.getLegacyFormatter(), and pass it to SimpleDateFormat/FastDateFormat.

Why are the changes needed?

Currently, the date formatter can delegate formatting/parsing to the legacy formatters in some cases but the formatters ignore the time zone specified via the zoneId parameter, and use another time zone returned by TimeZone.getDefault(). Ignoring of input parameters is bad practice.

Does this PR introduce any user-facing change?

No, if the default JVM time zone and the session time zone controlled by the SQL config spark.sql.session.timeZone are the same. That's the default case. If an user sets different value to the session and JVM time zones, he/she can observe different results.

How was this patch tested?

Modified tests in DateFormatterSuite to check that legacy date formatters don't depend on the default JVM time zone and respect to Spark's session time zone.

@SparkQA
Copy link

SparkQA commented Jun 3, 2020

Test build #123449 has finished for PR 28709 at commit cb126a9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class LegacyFastDateFormatter(
  • class LegacySimpleDateFormatter(

MaxGekk added 4 commits June 3, 2020 09:31
…-formatter-time-zone

# Conflicts:
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateFormatter.scala
#	sql/catalyst/src/test/scala/org/apache/spark/sql/util/DateFormatterSuite.scala
assert(LocalDate.ofEpochDay(formatter.parse("1000-01-01")) === LocalDate.of(1000, 1, 1))
assert(formatter.format(LocalDate.of(1000, 1, 1)) === "1000-01-01")
assert(formatter.format(localDateToDays(LocalDate.of(1000, 1, 1))) === "1000-01-01")
val cal = new Calendar.Builder()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The time zone is embedded into java.sql.Date, and it is the global default JVM time zone. To set tested time zone, I have to construct Date via the calendar.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jun 3, 2020

@cloud-fan @HyukjinKwon The changes can lead to inconsistent behaviour when JVM and session time zones are different:

$ export TZ="Europe/Moscow"
$ ./bin/spark-sql -S
spark-sql> set spark.sql.session.timeZone=America/Los_Angeles;
spark.sql.session.timeZone	America/Los_Angeles
spark-sql> select date '2020-06-03';
2020-06-02
spark-sql> select make_date(2020, 6, 3);
2020-06-02

This happens because toJavaDate() w/ the default JVM time zone is used while collecting dates.

@SparkQA
Copy link

SparkQA commented Jun 3, 2020

Test build #123473 has finished for PR 28709 at commit 2fcdaed.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk MaxGekk changed the title [WIP][SQL] Make legacy date formatters independent from the default JVM time zone [SPARK-31901][SQL] Use the session time zone in legacy date formatters Jun 3, 2020
@SparkQA
Copy link

SparkQA commented Jun 4, 2020

Test build #123506 has finished for PR 28709 at commit ee89265.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

…-formatter-time-zone

# Conflicts:
#	sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateFormatterSuite.scala
@SparkQA
Copy link

SparkQA commented Jun 5, 2020

Test build #123575 has finished for PR 28709 at commit 6535b24.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan cloud-fan closed this in de91915 Jun 9, 2020
cloud-fan pushed a commit that referenced this pull request Jun 9, 2020
…mJavaDate and legacy date formatters

### What changes were proposed in this pull request?
Update comments for `DateTimeUtils`.`toJavaDate` and `fromJavaDate`, and for the legacy date formatters `LegacySimpleDateFormatter` and `LegacyFastDateFormatter` regarding to the default JVM time zone. The comments say that the default JVM time zone is used intentionally for backward compatibility with Spark 2.4 and earlier versions.

Closes #28709

### Why are the changes needed?
To document current behaviour of related methods in `DateTimeUtils` and the legacy date formatters. For example, correctness of `HiveResult.hiveResultString` and `toHiveString` is directly related to the same time zone used by `toJavaDate` and `LegacyFastDateFormatter`.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running the Scala style checker `./dev/scalastyle`

Closes #28767 from MaxGekk/doc-legacy-formatters.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit de91915)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@MaxGekk MaxGekk deleted the legacy-date-formatter-time-zone branch December 11, 2020 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants