[SPARK-32258][SQL] NormalizeFloatingNumbers directly normalizes IF/CaseWhen/Coalesce child expressions #29061

viirya · 2020-07-09T20:42:49Z

What changes were proposed in this pull request?

This patch proposes to let NormalizeFloatingNumbers rule directly normalizes on certain children expressions. It could simplify expression tree.

Why are the changes needed?

Currently NormalizeFloatingNumbers rule treats some expressions as black box but we can optimize it a bit by normalizing directly the inner children expressions.

Also see #28962 (comment).

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit tests.

expressions.

.../test/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingPointNumbersSuite.scala

SparkQA · 2020-07-09T21:13:07Z

Test build #125516 has started for PR 29061 at commit ba0ea32.

.../test/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingPointNumbersSuite.scala

...talyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala

viirya · 2020-07-10T06:04:42Z

cc @cloud-fan

...talyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala

SparkQA · 2020-07-10T07:05:02Z

Test build #125542 has finished for PR 29061 at commit 5046337.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-07-12T07:05:02Z

Test build #125706 has finished for PR 29061 at commit 78b8667.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-07-12T07:06:08Z

retest this please

SparkQA · 2020-07-12T11:55:37Z

Test build #125707 has finished for PR 29061 at commit 78b8667.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

.../test/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingPointNumbersSuite.scala

...talyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala

SparkQA · 2020-07-12T21:27:50Z

Test build #125719 has finished for PR 29061 at commit d5dce7c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun

+1, LGTM. Thank you, @viirya and all.
Merged to master for Apache Spark 3.1.0 on December 2020.

cloud-fan · 2020-07-13T06:26:34Z

...talyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala

+
+    case Coalesce(children) =>
+      Coalesce(children.map(normalize))
+
    case _ if expr.dataType == FloatType || expr.dataType == DoubleType =>


Shall we put these new cases after this case? The main goal of this optimization is to avoid constructing a new CreateStruct during normalization. If it's just a float/double type If/CashWhen/Coalesce, it's actually an overhead to duplicate the normalization work in each child.

…double If/CaseWhen/Coalesce ### What changes were proposed in this pull request? This is followup to #29061. See #29061 (comment). Basically this moves If/CaseWhen/Coalesce case patterns after float/double case so we don't duplicate normalization on children for float/double If/CaseWhen/Coalesce. ### Why are the changes needed? Simplify expression tree. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Modify unit tests. Closes #29091 from viirya/SPARK-32258-followup. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

NormalizeFloatingNumbers can directly normalize on certain children

f1652aa

expressions.

probot-autolabeler bot added the SQL label Jul 9, 2020

dongjoon-hyun reviewed Jul 9, 2020

View reviewed changes

.../test/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingPointNumbersSuite.scala Outdated Show resolved Hide resolved

Change test name.

ba0ea32

This comment has been minimized.

Sign in to view

maropu reviewed Jul 10, 2020

View reviewed changes

.../test/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingPointNumbersSuite.scala Outdated Show resolved Hide resolved

maropu approved these changes Jul 10, 2020

View reviewed changes

HyukjinKwon reviewed Jul 10, 2020

View reviewed changes

.../test/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingPointNumbersSuite.scala Outdated Show resolved Hide resolved

Remove unused import and add jira number.

5046337

dongjoon-hyun reviewed Jul 10, 2020

View reviewed changes

...talyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala Show resolved Hide resolved

viirya changed the title ~~[SPARK-32258][SQL] NormalizeFloatingNumbers can directly normalize on certain children expressions~~ [SPARK-32258][SQL] NormalizeFloatingNumbers can directly normalize on IF and CaseWhen children expressions Jul 10, 2020

cloud-fan reviewed Jul 10, 2020

View reviewed changes

...talyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala Show resolved Hide resolved

Address comment.

78b8667

dongjoon-hyun changed the title ~~[SPARK-32258][SQL] NormalizeFloatingNumbers can directly normalize on IF and CaseWhen children expressions~~ [SPARK-32258][SQL] NormalizeFloatingNumbers directly normalizes IF/CaseWhen/Coalesce child expressions Jul 12, 2020

dongjoon-hyun reviewed Jul 12, 2020

View reviewed changes

.../test/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingPointNumbersSuite.scala Outdated Show resolved Hide resolved

Remove unused code.

d5dce7c

dongjoon-hyun reviewed Jul 12, 2020

View reviewed changes

...talyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala Show resolved Hide resolved

dongjoon-hyun approved these changes Jul 12, 2020

View reviewed changes

dongjoon-hyun closed this in b6229df Jul 12, 2020

cloud-fan reviewed Jul 13, 2020

View reviewed changes

viirya mentioned this pull request Jul 13, 2020

[SPARK-32258][SQL] Not duplicate normalization on children for float/double If/CaseWhen/Coalesce #29091

Closed

viirya deleted the SPARK-32258 branch December 27, 2023 18:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-32258][SQL] NormalizeFloatingNumbers directly normalizes IF/CaseWhen/Coalesce child expressions #29061

[SPARK-32258][SQL] NormalizeFloatingNumbers directly normalizes IF/CaseWhen/Coalesce child expressions #29061

viirya commented Jul 9, 2020

SparkQA commented Jul 9, 2020

This comment has been minimized.

viirya commented Jul 10, 2020

SparkQA commented Jul 10, 2020

SparkQA commented Jul 12, 2020

maropu commented Jul 12, 2020

SparkQA commented Jul 12, 2020

SparkQA commented Jul 12, 2020

dongjoon-hyun left a comment •

edited

Loading

cloud-fan Jul 13, 2020

viirya Jul 13, 2020

[SPARK-32258][SQL] NormalizeFloatingNumbers directly normalizes IF/CaseWhen/Coalesce child expressions #29061

[SPARK-32258][SQL] NormalizeFloatingNumbers directly normalizes IF/CaseWhen/Coalesce child expressions #29061

Conversation

viirya commented Jul 9, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

SparkQA commented Jul 9, 2020

This comment has been minimized.

viirya commented Jul 10, 2020

SparkQA commented Jul 10, 2020

SparkQA commented Jul 12, 2020

maropu commented Jul 12, 2020

SparkQA commented Jul 12, 2020

SparkQA commented Jul 12, 2020

dongjoon-hyun left a comment • edited Loading

Choose a reason for hiding this comment

cloud-fan Jul 13, 2020

Choose a reason for hiding this comment

viirya Jul 13, 2020

Choose a reason for hiding this comment

dongjoon-hyun left a comment •

edited

Loading