-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce DecimalUtil::adjustSumForOverflow helper method #7334
Conversation
✅ Deploy Preview for meta-velox canceled.
|
f5eca90
to
e858a25
Compare
@mbasmanova Could you help review this PR? #6020 and #5372 will use this common method. cc @rui-mo |
@rui-mo Rui, thank you for revewing. |
c6bb4cd
to
0cdb26e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing the comments.
Hi @mbasmanova. Please continue to take a look. |
|
||
DecimalUtil::valueInRange(sum); | ||
return sum; | ||
auto sum = computeValidSum(accumulator->sum, accumulator->overflow); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming this logic is Spark-specific, can we move it to functions/sparksql?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not only Spark that requires this logic, but Presto also needs it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liujiayi771 I see. Would you then update PR title and description to explain that this logic is generic and applies to both Presto and Spark. BTW, should there be some changes in the PR for the Presto logic, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presto directly uses the computeFinalValue
function in functions/lib/aggregates/SumAggregateBase.h
. Therefore, modifying the call in SumAggregateBase.h
should be sufficient.
*/ | ||
inline static std::optional<int128_t> computeValidSum( | ||
int128_t sum, | ||
int64_t overflow) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that overflow has only 3 values: 0, 1, -1; perhaps, it would be more readable to use an enum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overflow
can be greater than 1 or less than -1. In the actual calculation, the overflow in the accumulator accumulates continuously, but each time it can only increase by 1 or decrease by 1.
velox/velox/functions/lib/aggregates/DecimalAggregate.h
Lines 327 to 328 in fcb9a37
accumulator->overflow += | |
DecimalUtil::addWithOverflow(accumulator->sum, value, accumulator->sum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. The comment talked only about 1 and -1 which confused me. I also noticed that addWithOverflow doesn't have any comments. Would you add documentation to addWithOverflow? This is a tricky topic that is very hard to get right. Let's make sure we explain what we do clearly for future readers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I will try to add some comments.
* -(1 << 127) + sum. | ||
* | ||
*/ | ||
inline static std::optional<int128_t> computeValidSum( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about, adjustSumForOverflow or something that doesn't use term 'valid'?
1be842b
to
2533322
Compare
return sum; | ||
auto sum = DecimalUtil::adjustSumForOverflow( | ||
accumulator->sum, accumulator->overflow); | ||
VELOX_CHECK(sum.has_value(), "Decimal overflow"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be user error, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
velox/type/DecimalUtil.h
Outdated
@@ -271,6 +271,37 @@ class DecimalUtil { | |||
return overflow; | |||
} | |||
|
|||
/// This method is used to correct the sum result calculated using | |||
/// addWithOverflow. In the addWithOverflow method, when summing two int128_t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add comments to addWithOverflow so we can refer to them here instead of repeating. Let's also clarify in comments for addWithOverflow that results should be post-processed using this method (adjustSumForOverflow).
velox/type/tests/DecimalTest.cpp
Outdated
SumWithOverflow accumulator; | ||
// Int128_t.max + 100 will trigger one upward overflow, but the final sum | ||
// result calculated by DecimalUtil::addWithOverflow will be negative. | ||
std::vector<int128_t> sumOverflowUpwardOnceInputs = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this variable name is very long; maybe just 'inputs' or:
add(accumulator, std::numeric_limits<int128_t>::max());
add(accumulator, 100);
add(accumulator, std::numeric_limits<int128_t>::min());
EXPECT_EQ(accumulator.result(), 99);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add add(int128_t)
method in SumWithOverflow
.
Hi @mbasmanova. The comments for the function have been revised. Could you help to take a look, thanks. |
velox/type/tests/DecimalTest.cpp
Outdated
accumulator.add(std::numeric_limits<int128_t>::max()); | ||
accumulator.add(100); | ||
accumulator.add(std::numeric_limits<int128_t>::min()); | ||
EXPECT_EQ(accumulator.adjustedSum(), 99); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment above says "result will be negative" , but the values here is positive (99). This is confusing. Is the comment correct? If so, would it be possible to clarify to avoid confusing future readers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, it refers to the fact that the sum before "adjust" is negative. I will fix the comment more explicit.
velox/type/tests/DecimalTest.cpp
Outdated
EXPECT_EQ(accumulator.adjustedSum(), -101); | ||
|
||
accumulator.reset(); | ||
// These inputs will eventually trigger an overflow, and computeValidSum will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reference to computeValidSum is not valid
@Yuhta This is because there is an issue with the input of my test case. |
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@Yuhta Can I execute this test locally? |
@liujiayi771 Does that mean the code may still fail in production where we do not control the inputs? |
@mbasmanova This related to the algorithm of |
@mbasmanova Here, |
I see. Can we add a VELOX_DCHECK for the input then? |
OK. And I will modify the test case inputs. |
@mbasmanova There are still issues present, because the accumulation result of long decimals may exceed the limits of [kLongDecimalMin, kLongDecimalMax]. There are some extreme cases, for example, when the addition of multiple long decimal inputs precisely calculates to int128.min, and the next input to be accumulated is also negative, there is still a possibility of errors occurring. |
Can we add these DCHECKs? VELOX_DCHECK_NE(lhs, std::numeric_limits<int128_t>::min());
VELOX_DCHECK_NE(rhs, std::numeric_limits<int128_t>::min());
overflow = addUnsignedValues(result, -lhs, -rhs, true); |
@mbasmanova Hi, I add the DCHECK and change the test inputs. |
@liujiayi771 CI is red. Please, take a look. |
@mbasmanova It is the spark function
|
Is there an existing GitHub issue about this? I don't think computeIsAsciiForInputs is a Spark function. If there is no existing issue, please, create one and work with the Gluten team to prioritize fixing. |
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Alright, the failure is in Spark Fuzzer test, it is not a Spark function. Based on the code modification history, it seems like there have been issues here before. Perhaps Presto's Fuzzer test could also have this error. I will collaborate with the Gluten team to try to resolve it. |
4d13edd
to
9620ef9
Compare
Hi @mbasmanova. Any issues during the merge process? |
@liujiayi771 Can't get an internal approval to merge. Will try some more. CC: @Yuhta |
@liujiayi771 Got an approval, but the diff is too old now. Would you rebase? |
9620ef9
to
a20e28e
Compare
Done. Thanks. |
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@mbasmanova merged this pull request in 8550a15. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
The sum aggregate function in Spark and Presto uses this logic to adjust the sum
computed via DecimalUtil::addWithOverflow.