-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-26503][CORE] Get rid of spark.sql.legacy.timeParser.enabled #23411
Conversation
The time parser is kind of a breaking change but to me i'm fine with removing it considering thst we go for 3.0. It makes the codes difficult to maintain and blocks some features like date type inference. |
Test build #100545 has finished for PR 23411 at commit
|
For the failed test, there is the ticket: https://issues.apache.org/jira/browse/SPARK-26374 . I think need to change the test and use Java 8 API for results checking. |
@HyukjinKwon yeah this was suggested by @hvanhovell ... it's unclear whether we need a safety valve flag given this is a major release and I don't know if people would have relied on the previous behavior. @MaxGekk yeah saw that, and part of my concern is that test isn't 'compatible' with the new change yet, which needs to be addressed. |
Hm, this is a tough one @MaxGekk . For the particular test case that failed most recently, the random timestamp that is generated that causes a problem is -61688070376409L. I believe it's Jackson that ultimately serializes the timestamp to formatted date, and it produces That's consistent with
After accounting for time zone differnece, there's a 1 minute and 2 second difference. That's also the difference in the Spark test failures:
... but there's a nearly 2 day difference with the formatting from is this what you're getting to with #23391 and I am just catching up? I suppose we can't merge this until this difference is really fixed, but until that happens, the new functionality doesn't quite work for old dates right? |
The |
I have found a performance issue on parsing time zone offsets in jdk 8 (fixed in jdk 9): https://bugs.openjdk.java.net/browse/JDK-8066291 and https://stackoverflow.com/questions/34374464/extremely-slow-parsing-of-time-zone-with-the-new-java-time-api /cc @hvanhovell @gatorsmile @cloud-fan |
Hm that's not great, but is correctness more important than speed? probably so, but, if the correctness issue is only for old dates, maybe best to leave in the old parser. Should I just close this? and/or add a note about why one might want the old parser? |
It would be nice to re-run those JMH benchmarks and confirm the numbers on recent jdk 8 build. |
@MaxGekk I ran it locally with Java 8:
and Java 11:
Still a big difference unfortunately. I'd say let's leave in the old parser for the foreseeable future, at least. |
Given the potential perf issue here, I do think it's wise to keep the flag as a safety valve for 3.0. |
@srowen This is for the case when a zone is defined via zone id, right? I would guess |
I see; I'm just running the test at https://stackoverflow.com/questions/34374464/extremely-slow-parsing-of-time-zone-with-the-new-java-time-api which uses time zones like "CET" and parsed with pattern 'z'. Let me try to respin the benchmark with "XXX" and a suitable timezone. |
@MaxGekk I instead tried benchmarking old/new format, and Java 8:
Java 11:
No real difference (times are smaller here because I ran a shorter test.) Is that what you're interested in testing? If so are we back to favoring removing the old legacy parser? I'm OK with that. |
Yes, thank you.
I would continue with the PR. @hvanhovell WDYT? |
## What changes were proposed in this pull request? Per discussion in apache#23391 (comment) this proposes to just remove the old pre-Spark-3 time parsing behavior. This is a rebase of apache#23411 ## How was this patch tested? Existing tests. Closes apache#23495 from srowen/SPARK-26503.2. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>
What changes were proposed in this pull request?
Per discussion in #23391 (comment) this proposes to just remove the old pre-Spark-3 time parsing behavior.
How was this patch tested?
Existing tests.