Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32258][SQL] NormalizeFloatingNumbers directly normalizes IF/CaseWhen/Coalesce child expressions #29061

Closed
wants to merge 5 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Jul 9, 2020

What changes were proposed in this pull request?

This patch proposes to let NormalizeFloatingNumbers rule directly normalizes on certain children expressions. It could simplify expression tree.

Why are the changes needed?

Currently NormalizeFloatingNumbers rule treats some expressions as black box but we can optimize it a bit by normalizing directly the inner children expressions.

Also see #28962 (comment).

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit tests.

@SparkQA
Copy link

SparkQA commented Jul 9, 2020

Test build #125516 has started for PR 29061 at commit ba0ea32.

@SparkQA

This comment has been minimized.

@viirya viirya changed the title [SPARK-32258][SQL] NormalizeFloatingNumbers can directly normalize on certain children expressions [SPARK-32258][SQL] NormalizeFloatingNumbers can directly normalize on IF and CaseWhen children expressions Jul 10, 2020
@viirya
Copy link
Member Author

viirya commented Jul 10, 2020

cc @cloud-fan

@SparkQA
Copy link

SparkQA commented Jul 10, 2020

Test build #125542 has finished for PR 29061 at commit 5046337.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 12, 2020

Test build #125706 has finished for PR 29061 at commit 78b8667.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Jul 12, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Jul 12, 2020

Test build #125707 has finished for PR 29061 at commit 78b8667.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-32258][SQL] NormalizeFloatingNumbers can directly normalize on IF and CaseWhen children expressions [SPARK-32258][SQL] NormalizeFloatingNumbers directly normalizes IF/CaseWhen/Coalesce child expressions Jul 12, 2020
@SparkQA
Copy link

SparkQA commented Jul 12, 2020

Test build #125719 has finished for PR 29061 at commit d5dce7c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @viirya and all.
Merged to master for Apache Spark 3.1.0 on December 2020.


case Coalesce(children) =>
Coalesce(children.map(normalize))

case _ if expr.dataType == FloatType || expr.dataType == DoubleType =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we put these new cases after this case? The main goal of this optimization is to avoid constructing a new CreateStruct during normalization. If it's just a float/double type If/CashWhen/Coalesce, it's actually an overhead to duplicate the normalization work in each child.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

cloud-fan pushed a commit that referenced this pull request Jul 14, 2020
…double If/CaseWhen/Coalesce

### What changes were proposed in this pull request?

This is followup to #29061. See #29061 (comment). Basically this moves If/CaseWhen/Coalesce case patterns after float/double case so we don't duplicate normalization on children for float/double If/CaseWhen/Coalesce.

### Why are the changes needed?

Simplify expression tree.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Modify unit tests.

Closes #29091 from viirya/SPARK-32258-followup.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@viirya viirya deleted the SPARK-32258 branch December 27, 2023 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants