[SPARK-26233][SQL] CheckOverflow when encoding a decimal value #23210

mgaido91 · 2018-12-03T16:19:36Z

What changes were proposed in this pull request?

When we encode a Decimal from external source we don't check for overflow. That method is useful not only in order to enforce that we can represent the correct value in the specified range, but it also changes the underlying data to the right precision/scale. Since in our code generation we assume that a decimal has exactly the same precision and scale of its data type, missing to enforce it can lead to corrupted output/results when there are subsequent transformations.

How was this patch tested?

added UT

SparkQA · 2018-12-03T20:00:55Z

Test build #99621 has finished for PR 23210 at commit 91d3e1b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-12-03T20:26:15Z

sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala

@@ -1647,6 +1647,15 @@ class DatasetSuite extends QueryTest with SharedSQLContext {
    checkDataset(ds, data: _*)
    checkAnswer(ds.select("x"), Seq(Row(1), Row(2)))
  }
+
+  test("SPARK-26233: serializer should enforce decimal precision and scale") {


Can we have a test case in RowEncoderSuite, too?

Well, everything is possible, but it is not easy actually. Because the issue here happens in the codegen, not when we retrieve the output. So if we just encode and decode everything is fine. The problem happens if there is any transformation in the codegen meanwhile, because there the underlying decimal is used (assuming that it has the same precision and scale of the data type - which without the current change is not always true). I tried checking the precision and scale of the serialized object, but it is not really feasible as they are converted when it is read (please see UnsafeRow)... So I'd avoid this actually.

dongjoon-hyun · 2018-12-03T20:27:59Z

LGTM except one minor comment about a new test case.

Ping, @gatorsmile and @cloud-fan . Could you review this PR? This is to fix a correctness issue.

dongjoon-hyun

+1, LGTM.
It seems that there is no other comments, I'll merge this to master.

dongjoon-hyun · 2018-12-04T18:35:33Z

Thank you, @mgaido91 . This needs to land branch-2.4/branch-2.3/branch-2.2, but it fails at branch-2.4 due to the conflicts in the test case file. Could you make separate backport PRs for each branch?

cloud-fan · 2018-12-05T02:04:21Z

a late LGTM

When we encode a Decimal from external source we don't check for overflow. That method is useful not only in order to enforce that we can represent the correct value in the specified range, but it also changes the underlying data to the right precision/scale. Since in our code generation we assume that a decimal has exactly the same precision and scale of its data type, missing to enforce it can lead to corrupted output/results when there are subsequent transformations. added UT Closes apache#23210 from mgaido91/SPARK-26233. Authored-by: Marco Gaido <marcogaido91@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

mgaido91 · 2018-12-05T11:51:01Z

thanks @cloud-fan @dongjoon-hyun, I created the PRs for the backports.

## What changes were proposed in this pull request? When we encode a Decimal from external source we don't check for overflow. That method is useful not only in order to enforce that we can represent the correct value in the specified range, but it also changes the underlying data to the right precision/scale. Since in our code generation we assume that a decimal has exactly the same precision and scale of its data type, missing to enforce it can lead to corrupted output/results when there are subsequent transformations. ## How was this patch tested? added UT Closes apache#23210 from mgaido91/SPARK-26233. Authored-by: Marco Gaido <marcogaido91@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

[SPARK-26233][SQL] CheckOverflow when encoding a decimal value

91d3e1b

dongjoon-hyun reviewed Dec 3, 2018

View reviewed changes

dongjoon-hyun approved these changes Dec 4, 2018

View reviewed changes

asfgit closed this in 556d83e Dec 4, 2018

mgaido91 mentioned this pull request Dec 5, 2018

[SPARK-26233][SQL][BACKPORT-2.4] CheckOverflow when encoding a decimal value #23232

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-26233][SQL] CheckOverflow when encoding a decimal value #23210

[SPARK-26233][SQL] CheckOverflow when encoding a decimal value #23210

mgaido91 commented Dec 3, 2018

SparkQA commented Dec 3, 2018

dongjoon-hyun Dec 3, 2018

mgaido91 Dec 3, 2018

dongjoon-hyun commented Dec 3, 2018

dongjoon-hyun left a comment •

edited

Loading

dongjoon-hyun commented Dec 4, 2018 •

edited

Loading

cloud-fan commented Dec 5, 2018

mgaido91 commented Dec 5, 2018

[SPARK-26233][SQL] CheckOverflow when encoding a decimal value #23210

[SPARK-26233][SQL] CheckOverflow when encoding a decimal value #23210

Conversation

mgaido91 commented Dec 3, 2018

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Dec 3, 2018

dongjoon-hyun Dec 3, 2018

Choose a reason for hiding this comment

mgaido91 Dec 3, 2018

Choose a reason for hiding this comment

dongjoon-hyun commented Dec 3, 2018

dongjoon-hyun left a comment • edited Loading

Choose a reason for hiding this comment

dongjoon-hyun commented Dec 4, 2018 • edited Loading

cloud-fan commented Dec 5, 2018

mgaido91 commented Dec 5, 2018

dongjoon-hyun left a comment •

edited

Loading

dongjoon-hyun commented Dec 4, 2018 •

edited

Loading