-
Notifications
You must be signed in to change notification settings - Fork 750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update try_binary
and checked_ops
, and remove math_checked_op
#2717
Conversation
delete math_checked_op update the return type of checked ops Signed-off-by: remzi <13716567376yh@gmail.com>
Mark this as a draft now because I also need to update the |
Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
Hmm, tests failed on SIMD. Let me find a good solution. |
fn div_checked(self, rhs: Self) -> Result<Self> { | ||
if rhs.is_zero() { | ||
Err(ArrowError::DivideByZero) | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ehh, I don't think this is correct. Your proposal #2719 is only checking division by zero error for integer division. But it doesn't match the behavior of C++ DivideChecked kernel, and Spark divide expression. We have clear distinction between checked and non-checked kernels. The checked kernel should catch such invalid inputs at the runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't match the behavior of C++ DivideChecked kernel
Thank you @viirya, I find the code in C++: https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/base_arithmetic_internal.h#L410-L419 and it does the check for float point values.
If we follow the c++ behaviour, maybe we should add some docs such as
Float point Division by zero will return an error in `checked_div` but not panic when using `unchecked_div`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
divide_checked
doc has mentioned that it returns "ArrowError::DivideByZero if any right side value is zero". divide
doc also describes that for floating point types, the result of dividing by zero follows normal floating point rules. Only for other numeric types, division by zero will panic.
I think it already said what you meant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. I'll update the issue and the PR
Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
/// This is similar to `math_op` as it performs given operation between two input primitive arrays. | ||
/// But the given operation can return `None` if overflow is detected. For the case, this function | ||
/// returns an `Err`. | ||
fn math_checked_op<LT, RT, F>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this function any more because this is the same as try_binary
after we updating the try_binary
and the type of op
{ | ||
if left.len() != right.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to remove this function but met some error from the type system. I will file a follow-up ticket to remove this.
@@ -2026,31 +1967,57 @@ mod tests { | |||
|
|||
#[test] | |||
#[should_panic(expected = "DivideByZero")] | |||
fn test_primitive_array_divide_by_zero_with_checked() { | |||
fn test_int_array_divide_by_zero_with_checked() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename the test case to make it clearer.
} | ||
|
||
fn mul_wrapping(self, rhs: Self) -> Self { | ||
self * rhs | ||
} | ||
|
||
fn div_checked(self, rhs: Self) -> Option<Self> { | ||
Some(self / rhs) | ||
fn div_checked(self, rhs: Self) -> Result<Self> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for float type, we also need to check the DivideByZero
error, make this as the default behaviour.
Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
@viirya @liukun4515 @tustvold PTAL, thank you. |
Currently running the benchmarks for this, just in case, LLVM is a fickle beast |
The benchmarks regularly fluctuate by +- 25%, this is not all that surprising given we're talking microseconds, but importantly we are not seeing anything greater which would imply a failure to vectorise correctly or similar 👍 |
Benchmark runs are scheduled for baseline = 43d912c and contender = f572ec1. f572ec1 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
typed_dict_math_op!( | ||
left, | ||
right, | ||
|a, b| a.div_checked(b), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err...why you changed it to use div_checked
...
divide_dyn
is no-overflow checking kernel.
I'm adding checked variant in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this fixes #2647 perhaps? I have to confess to being a little lost now, I had thought we'd collectively agreed that division didn't make sense to have an unchecked variant as Rust always checks division 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I meant overflow-checking. I think this change doesn't change division by zero behavior of divide_dyn
kernel. It returns ArrowError::DivideByZero
with/without this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would division overflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're fine the definition of div_checked
for integral types is
if rhs.is_zero() {
Err(ArrowError::DivideByZero)
} else {
self.checked_div(rhs).ok_or_else(|| {
ArrowError::ComputeError(format!(
"Overflow happened on: {:?} / {:?}",
self, rhs
))
})
}
The checked_div is technically redundant as we've already checked that the value isn't zero, i.e. this code could be simplified to
self.checked_div(rhs).ok_or_else(|| {
ArrowError::DivideByZero
})
The definition for floating points is
if rhs.is_zero() {
Err(ArrowError::DivideByZero)
} else {
Ok(self / rhs)
}
Which will not check for overflow, as it isn't meaningful to do so for floats?
TLDR: I don't think overflow checking for division makes sense? The only types that could overlow, i.e. floats, already have a way to represent that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For "checked" kernel, I meant overflow checking behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Op, I am wrong, sorry.
The divide_dyn
just checked the dividebyzero
error, but not the overflow error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries. I've fixed it in the PR to add checked variant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add another divide
ops in the ArrowNativeTypeOp
that only checks the dividebyzero
error ? For example:
// check dividebyzero and overflow
fn div_fully_checked(self, rhs: Self) -> Result<Self> {
if rhs.is_zero() {
Err(ArrowError::DivideByZero)
} else {
self.checked_div(rhs).ok_or_else(|| {
ArrowError::ComputeError(format!(
"Overflow happened on: {:?} / {:?}",
self, rhs
))
})
}
}
// only check the dividebyzero error
fn div_checked_divideByZero(self, rhs: Self) -> Result<Self> {}
// no check
fn div_wrapping(self, rhs: Self) -> Self {
self.wrapping_div(rhs)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think divide_dyn
is the kernel which only checks division by zero error.
edit: Oh, I misread your comment. But I don't think it is good idea to add too many APIs there. It is pretty easy to do extra division by zero check + wrapping division.
Signed-off-by: remzi 13716567376yh@gmail.com
Which issue does this PR close?
Closes #2715.
Rationale for this change
try_binary
should not panic on unequaled array lengthResult
. Becausetry_binary
and othertry_...
kernelsop
s in the compute kernel, which can allow us to do some simplification on those private functionsmath_checked_op
because it is the same astry_binary
What changes are included in this PR?
Are there any user-facing changes?