-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-26233][SQL] CheckOverflow when encoding a decimal value #23210
Conversation
Test build #99621 has finished for PR 23210 at commit
|
@@ -1647,6 +1647,15 @@ class DatasetSuite extends QueryTest with SharedSQLContext { | |||
checkDataset(ds, data: _*) | |||
checkAnswer(ds.select("x"), Seq(Row(1), Row(2))) | |||
} | |||
|
|||
test("SPARK-26233: serializer should enforce decimal precision and scale") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a test case in RowEncoderSuite
, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, everything is possible, but it is not easy actually. Because the issue here happens in the codegen, not when we retrieve the output. So if we just encode and decode everything is fine. The problem happens if there is any transformation in the codegen meanwhile, because there the underlying decimal is used (assuming that it has the same precision and scale of the data type - which without the current change is not always true). I tried checking the precision and scale of the serialized object, but it is not really feasible as they are converted when it is read (please see UnsafeRow
)... So I'd avoid this actually.
LGTM except one minor comment about a new test case. Ping, @gatorsmile and @cloud-fan . Could you review this PR? This is to fix a correctness issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
It seems that there is no other comments, I'll merge this to master
.
Thank you, @mgaido91 . This needs to land |
a late LGTM |
When we encode a Decimal from external source we don't check for overflow. That method is useful not only in order to enforce that we can represent the correct value in the specified range, but it also changes the underlying data to the right precision/scale. Since in our code generation we assume that a decimal has exactly the same precision and scale of its data type, missing to enforce it can lead to corrupted output/results when there are subsequent transformations. added UT Closes apache#23210 from mgaido91/SPARK-26233. Authored-by: Marco Gaido <marcogaido91@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
When we encode a Decimal from external source we don't check for overflow. That method is useful not only in order to enforce that we can represent the correct value in the specified range, but it also changes the underlying data to the right precision/scale. Since in our code generation we assume that a decimal has exactly the same precision and scale of its data type, missing to enforce it can lead to corrupted output/results when there are subsequent transformations. added UT Closes apache#23210 from mgaido91/SPARK-26233. Authored-by: Marco Gaido <marcogaido91@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
When we encode a Decimal from external source we don't check for overflow. That method is useful not only in order to enforce that we can represent the correct value in the specified range, but it also changes the underlying data to the right precision/scale. Since in our code generation we assume that a decimal has exactly the same precision and scale of its data type, missing to enforce it can lead to corrupted output/results when there are subsequent transformations. added UT Closes apache#23210 from mgaido91/SPARK-26233. Authored-by: Marco Gaido <marcogaido91@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
thanks @cloud-fan @dongjoon-hyun, I created the PRs for the backports. |
## What changes were proposed in this pull request? When we encode a Decimal from external source we don't check for overflow. That method is useful not only in order to enforce that we can represent the correct value in the specified range, but it also changes the underlying data to the right precision/scale. Since in our code generation we assume that a decimal has exactly the same precision and scale of its data type, missing to enforce it can lead to corrupted output/results when there are subsequent transformations. ## How was this patch tested? added UT Closes apache#23210 from mgaido91/SPARK-26233. Authored-by: Marco Gaido <marcogaido91@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
When we encode a Decimal from external source we don't check for overflow. That method is useful not only in order to enforce that we can represent the correct value in the specified range, but it also changes the underlying data to the right precision/scale. Since in our code generation we assume that a decimal has exactly the same precision and scale of its data type, missing to enforce it can lead to corrupted output/results when there are subsequent transformations.
How was this patch tested?
added UT