Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-26653][SQL] Use Proleptic Gregorian calendar in parsing JDBC lower/upper bounds #23597

Closed
wants to merge 11 commits into from

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Jan 20, 2019

What changes were proposed in this pull request?

In the PR, I propose using of the stringToDate and stringToTimestamp methods in parsing JDBC lower/upper bounds of the partition column if it has DateType or TimestampType. Since those methods have been ported on Proleptic Gregorian calendar by #23512, the PR switches parsing of JDBC bounds of the partition column on the calendar as well.

How was this patch tested?

This was tested by JDBCSuite.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jan 20, 2019

@maropu Please, take a look at this PR.

@SparkQA
Copy link

SparkQA commented Jan 20, 2019

Test build #101449 has finished for PR 23597 at commit 0b61076.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Jan 21, 2019

LGTM except for one minor comment. cc: @gatorsmile

@SparkQA
Copy link

SparkQA commented Jan 21, 2019

Test build #101487 has finished for PR 23597 at commit 8bb4f3a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

case _: NumericType => value.toLong
case DateType => parse(stringToDate).toLong
case TimestampType =>
parse(stringToTimestamp(_, getTimeZone(SQLConf.get.sessionLocalTimeZone)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decide to adjust the timestamp constants based on the user-specified local timezone?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we call Timestamp.valueOf(value), which uses JVM local timezone. It seems to me that using Spark session timezone is better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we have to do it. This is a followup of #23391 , which changed how we turn the timestamp boundaries to string. Here we change hoow we turn string to timestamp.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a behavior change. We need to clearly document which inputs start respecting our Spark local session timezone?

Copy link
Member

@HyukjinKwon HyukjinKwon Jan 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we mention something like .. all string -> timestamp will respect Session timezone, JDBC lower/upper bounds, blabla, ..., and java 8 time will be consistently used across code base .. after the sub-tasks in the umbrella are resolved?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should improve the migration guide. Switching to Proleptic Gregorian calendar is a behavior change to many places, it's better we can list all of them in the migration guide.

case _: NumericType => value.toLong
case DateType => DateTimeUtils.fromJavaDate(Date.valueOf(value)).toLong
case TimestampType => DateTimeUtils.fromJavaTimestamp(Timestamp.valueOf(value))
private def toInternalBoundValue(value: String, columnType: DataType): Long = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should pass in the timezone id, just like what we did for toBoundValueInWhereClause

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@SparkQA
Copy link

SparkQA commented Jan 22, 2019

Test build #101524 has finished for PR 23597 at commit a0b23ed.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 22, 2019

Test build #101545 has finished for PR 23597 at commit 4dc4a2a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

LGTM, let's resole the conflict.

@SparkQA
Copy link

SparkQA commented Jan 23, 2019

Test build #101579 has finished for PR 23597 at commit af20442.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 46d5bb9 Jan 23, 2019
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…ower/upper bounds

## What changes were proposed in this pull request?

In the PR, I propose using of the `stringToDate` and `stringToTimestamp` methods in parsing JDBC lower/upper bounds of the partition column if it has `DateType` or `TimestampType`. Since those methods have been ported on Proleptic Gregorian calendar by apache#23512, the PR switches parsing of JDBC bounds of the partition column on the calendar as well.

## How was this patch tested?

This was tested by `JDBCSuite`.

Closes apache#23597 from MaxGekk/jdbc-parse-timestamp-bounds.

Lead-authored-by: Maxim Gekk <maxim.gekk@databricks.com>
Co-authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@MaxGekk MaxGekk deleted the jdbc-parse-timestamp-bounds branch August 17, 2019 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants