-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-26379][SS][BRANCH-2.3] Use dummy TimeZoneId to avoid UnresolvedException in CurrentBatchTimestamp #23656
Conversation
…to streaming query ## What changes were proposed in this pull request? This patch proposes to fix issue on adding `current_timestamp` / `current_date` with streaming query. The root reason is that Spark transforms `CurrentTimestamp`/`CurrentDate` to `CurrentBatchTimestamp` in MicroBatchExecution which makes transformed attributes not-yet-resolved. They will be resolved by IncrementalExecution. (In ContinuousExecution, Spark doesn't allow using `current_timestamp` and `current_date` so it has been OK.) It's OK for DataSource V1 sink because it simply leverages transformed logical plan and don't evaluate until they're resolved, but for DataSource V2 sink, Spark tries to extract the schema of transformed logical plan in prior to IncrementalExecution, and unresolved attributes will raise errors. This patch fixes the issue via having separate pre-resolved logical plan to pass the schema to StreamingWriteSupport safely. ## How was this patch tested? Added UT. Closes apache#23609 from HeartSaVioR/SPARK-26379. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Thank you, @HeartSaVioR . :) |
Test build #101696 has finished for PR 23656 at commit
|
We will revisit this PR because the original PR needs a more verification and a followup PR is expected. |
FYI: follow-up PR is available in #23660 submitted by Dongjoon. |
Hi, @HeartSaVioR . Could you update this PR with the final content? |
…xception in CurrentBatchTimestamp ## What changes were proposed in this pull request? Spark replaces `CurrentTimestamp` with `CurrentBatchTimestamp`. However, `CurrentBatchTimestamp` is `TimeZoneAwareExpression` while `CurrentTimestamp` isn't. Without TimeZoneId, `CurrentBatchTimestamp` becomes unresolved and raises `UnresolvedException`. Since `CurrentDate` is `TimeZoneAwareExpression`, there is no problem with `CurrentDate`. This PR reverts the [previous patch](apache#23609) on `MicroBatchExecution` and fixes the root cause. ## How was this patch tested? Pass the Jenkins with the updated test cases. Closes apache#23660 from dongjoon-hyun/SPARK-26379. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun Yes, I've just cherry-picked the FOLLOW-UP commit into PR branch. |
Thank you, @HeartSaVioR . It's enough since this will be merged as a single commit. |
@HeartSaVioR . Could you update the PR title and description, too? |
Test build #101734 has finished for PR 23656 at commit
|
@dongjoon-hyun Sorry for the late. Just updated the title and content of PR. |
Test failure looks not relevant. Retest this, please. |
yea, that's known flaky tests in branch-2.3.. |
Test build #101740 has finished for PR 23656 at commit
|
retest this please |
Thank you, @HeartSaVioR and @maropu . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM (pending Jenkins).
Test build #101743 has finished for PR 23656 at commit
|
Retest this please. |
Test build #101747 has finished for PR 23656 at commit
|
retest this, please |
Test build #101749 has finished for PR 23656 at commit
|
Retest this please |
super flaky.... |
Yeah... If the branch should be maintained further, it may need to fix the flaky tests... As it seems to be known flakiness, are we aware about the fix in upper version lines? |
Test build #101758 has finished for PR 23656 at commit
|
Retest this please. |
2 similar comments
Retest this please. |
Retest this please. |
@HeartSaVioR and @maropu . I triggered Jenkins three times intentionally to pass the Jenkins. |
Test build #101765 has finished for PR 23656 at commit
|
Test build #101764 has finished for PR 23656 at commit
|
Test build #101763 has finished for PR 23656 at commit
|
2 failures, 1 passes. It looks like kind of serious one if Spark community will maintain branch-2.3 forward. |
@HeartSaVioR . Yes. It's true and |
Merged to branch-2.3. |
…dException in CurrentBatchTimestamp ## What changes were proposed in this pull request? Spark replaces `CurrentTimestamp` with `CurrentBatchTimestamp`. However, `CurrentBatchTimestamp` is `TimeZoneAwareExpression` while `CurrentTimestamp` isn't. Without TimeZoneId, `CurrentBatchTimestamp` becomes unresolved and raises `UnresolvedException`. Since `CurrentDate` is `TimeZoneAwareExpression`, there is no problem with `CurrentDate`. ## How was this patch tested? Pass the Jenkins with the updated test cases. Closes #23656 from HeartSaVioR/SPARK-26379-branch-2.3. Lead-authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com> Co-authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@HeartSaVioR I'm not familiar with the SS part though, your other pr (#23634) is not related to the failures? It seems the failures are of the outer joins tests... |
If #23634 was the fix for tests they would be flaky for branch-2.4 / master as well. They don't seem to be flaky in master branch. |
Aha, I see. |
What changes were proposed in this pull request?
Spark replaces
CurrentTimestamp
withCurrentBatchTimestamp
.However,
CurrentBatchTimestamp
isTimeZoneAwareExpression
whileCurrentTimestamp
isn't.Without TimeZoneId,
CurrentBatchTimestamp
becomes unresolved and raisesUnresolvedException
.Since
CurrentDate
isTimeZoneAwareExpression
, there is no problem withCurrentDate
.How was this patch tested?
Pass the Jenkins with the updated test cases.