Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-26379][SS][BRANCH-2.3] Use dummy TimeZoneId to avoid UnresolvedException in CurrentBatchTimestamp #23656

Closed

Conversation

HeartSaVioR
Copy link
Contributor

@HeartSaVioR HeartSaVioR commented Jan 25, 2019

What changes were proposed in this pull request?

Spark replaces CurrentTimestamp with CurrentBatchTimestamp.
However, CurrentBatchTimestamp is TimeZoneAwareExpression while CurrentTimestamp isn't.
Without TimeZoneId, CurrentBatchTimestamp becomes unresolved and raises UnresolvedException.

Since CurrentDate is TimeZoneAwareExpression, there is no problem with CurrentDate.

How was this patch tested?

Pass the Jenkins with the updated test cases.

…to streaming query

## What changes were proposed in this pull request?

This patch proposes to fix issue on adding `current_timestamp` / `current_date` with streaming query.

The root reason is that Spark transforms `CurrentTimestamp`/`CurrentDate` to `CurrentBatchTimestamp` in MicroBatchExecution which makes transformed attributes not-yet-resolved. They will be resolved by IncrementalExecution.
(In ContinuousExecution, Spark doesn't allow using `current_timestamp` and `current_date` so it has been OK.)

It's OK for DataSource V1 sink because it simply leverages transformed logical plan and don't evaluate until they're resolved, but for DataSource V2 sink, Spark tries to extract the schema of transformed logical plan in prior to IncrementalExecution, and unresolved attributes will raise errors.

This patch fixes the issue via having separate pre-resolved logical plan to pass the schema to StreamingWriteSupport safely.

## How was this patch tested?

Added UT.

Closes apache#23609 from HeartSaVioR/SPARK-26379.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@HeartSaVioR HeartSaVioR changed the title [SPARK-26379][SS] Fix issue on adding current_timestamp/current_date to streaming query [BRANCH-2.3][SPARK-26379][SS] Fix issue on adding current_timestamp/current_date to streaming query Jan 25, 2019
@dongjoon-hyun
Copy link
Member

Thank you, @HeartSaVioR . :)

@dongjoon-hyun dongjoon-hyun changed the title [BRANCH-2.3][SPARK-26379][SS] Fix issue on adding current_timestamp/current_date to streaming query [SPARK-26379][SS][BACKPORT-2.3] Fix issue on adding current_timestamp/current_date to streaming query Jan 25, 2019
@SparkQA
Copy link

SparkQA commented Jan 26, 2019

Test build #101696 has finished for PR 23656 at commit cec965c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

We will revisit this PR because the original PR needs a more verification and a followup PR is expected.

@HeartSaVioR
Copy link
Contributor Author

FYI: follow-up PR is available in #23660 submitted by Dongjoon.

@dongjoon-hyun
Copy link
Member

Hi, @HeartSaVioR . Could you update this PR with the final content?

…xception in CurrentBatchTimestamp

## What changes were proposed in this pull request?

Spark replaces `CurrentTimestamp` with `CurrentBatchTimestamp`.
However, `CurrentBatchTimestamp` is `TimeZoneAwareExpression` while `CurrentTimestamp` isn't.
Without TimeZoneId, `CurrentBatchTimestamp` becomes unresolved and raises `UnresolvedException`.

Since `CurrentDate` is `TimeZoneAwareExpression`, there is no problem with `CurrentDate`.

This PR reverts the [previous patch](apache#23609) on `MicroBatchExecution` and fixes the root cause.

## How was this patch tested?

Pass the Jenkins with the updated test cases.

Closes apache#23660 from dongjoon-hyun/SPARK-26379.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@HeartSaVioR
Copy link
Contributor Author

@dongjoon-hyun Yes, I've just cherry-picked the FOLLOW-UP commit into PR branch.

@dongjoon-hyun
Copy link
Member

Thank you, @HeartSaVioR . It's enough since this will be merged as a single commit.

@dongjoon-hyun
Copy link
Member

@HeartSaVioR . Could you update the PR title and description, too?

@SparkQA
Copy link

SparkQA commented Jan 28, 2019

Test build #101734 has finished for PR 23656 at commit 52ac627.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR HeartSaVioR changed the title [SPARK-26379][SS][BACKPORT-2.3] Fix issue on adding current_timestamp/current_date to streaming query [SPARK-26379][SS][BRANCH-2.3] Use dummy TimeZoneId to avoid UnresolvedException in CurrentBatchTimestamp Jan 28, 2019
@HeartSaVioR
Copy link
Contributor Author

@dongjoon-hyun Sorry for the late. Just updated the title and content of PR.

@HeartSaVioR
Copy link
Contributor Author

Test failure looks not relevant. Retest this, please.

@maropu
Copy link
Member

maropu commented Jan 28, 2019

yea, that's known flaky tests in branch-2.3..

@SparkQA
Copy link

SparkQA commented Jan 28, 2019

Test build #101740 has finished for PR 23656 at commit 52ac627.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Jan 28, 2019

retest this please

@dongjoon-hyun
Copy link
Member

Thank you, @HeartSaVioR and @maropu .

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM (pending Jenkins).

@SparkQA
Copy link

SparkQA commented Jan 28, 2019

Test build #101743 has finished for PR 23656 at commit 52ac627.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Jan 28, 2019

Test build #101747 has finished for PR 23656 at commit 52ac627.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

retest this, please

@SparkQA
Copy link

SparkQA commented Jan 28, 2019

Test build #101749 has finished for PR 23656 at commit 52ac627.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

Retest this please

@maropu
Copy link
Member

maropu commented Jan 28, 2019

super flaky....

@HeartSaVioR
Copy link
Contributor Author

Yeah... If the branch should be maintained further, it may need to fix the flaky tests... As it seems to be known flakiness, are we aware about the fix in upper version lines?

@SparkQA
Copy link

SparkQA commented Jan 28, 2019

Test build #101758 has finished for PR 23656 at commit 52ac627.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

2 similar comments
@dongjoon-hyun
Copy link
Member

Retest this please.

@dongjoon-hyun
Copy link
Member

Retest this please.

@dongjoon-hyun
Copy link
Member

@HeartSaVioR and @maropu . I triggered Jenkins three times intentionally to pass the Jenkins.

@SparkQA
Copy link

SparkQA commented Jan 28, 2019

Test build #101765 has finished for PR 23656 at commit 52ac627.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 28, 2019

Test build #101764 has finished for PR 23656 at commit 52ac627.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 28, 2019

Test build #101763 has finished for PR 23656 at commit 52ac627.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

HeartSaVioR commented Jan 28, 2019

2 failures, 1 passes. It looks like kind of serious one if Spark community will maintain branch-2.3 forward.

@dongjoon-hyun
Copy link
Member

@HeartSaVioR . Yes. It's true and branch-2.3 will be EOF in several months.

@dongjoon-hyun
Copy link
Member

Merged to branch-2.3.

dongjoon-hyun added a commit that referenced this pull request Jan 28, 2019
…dException in CurrentBatchTimestamp

## What changes were proposed in this pull request?

Spark replaces `CurrentTimestamp` with `CurrentBatchTimestamp`.
However, `CurrentBatchTimestamp` is `TimeZoneAwareExpression` while `CurrentTimestamp` isn't.
Without TimeZoneId, `CurrentBatchTimestamp` becomes unresolved and raises `UnresolvedException`.

Since `CurrentDate` is `TimeZoneAwareExpression`, there is no problem with `CurrentDate`.

## How was this patch tested?

Pass the Jenkins with the updated test cases.

Closes #23656 from HeartSaVioR/SPARK-26379-branch-2.3.

Lead-authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@HeartSaVioR HeartSaVioR deleted the SPARK-26379-branch-2.3 branch January 28, 2019 20:29
@maropu
Copy link
Member

maropu commented Jan 29, 2019

@HeartSaVioR I'm not familiar with the SS part though, your other pr (#23634) is not related to the failures? It seems the failures are of the outer joins tests...

@HeartSaVioR
Copy link
Contributor Author

If #23634 was the fix for tests they would be flaky for branch-2.4 / master as well. They don't seem to be flaky in master branch.

@maropu
Copy link
Member

maropu commented Jan 29, 2019

Aha, I see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants