-
Notifications
You must be signed in to change notification settings - Fork 770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix take_bytes Null and Overflow Handling (#4576) #4579
Conversation
if array.is_valid(index) { | ||
let s: &[u8] = array.value(index).as_ref(); | ||
|
||
length_so_far += T::Offset::from_usize(s.len()).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wasn't performing checked arithmetic and so could silently overflow
if array.is_valid(index) { | ||
let s: &[u8] = array.value(index).as_ref(); | ||
|
||
length_so_far += T::Offset::from_usize(s.len()).unwrap(); | ||
values.extend_from_slice(s.as_ref()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We piggy back on the fact this already does checked arithmetic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @tustvold -- I went through this PR carefully and it looks good to me logic wise. I defer to you and others for performance.
Maybe this is something @jhorstmann or @viirya are interested in reviewing as well
I also verified that without the code changes in this PR the test fails like
---- take::tests::test_take_bytes_null_indices stdout ----
thread 'take::tests::test_take_bytes_null_indices' panicked at 'assertion failed: idx < self.len', /Users/alamb/Software/arrow-rs2/arrow-buffer/src/buffer/boolean.rs:133:9
stack backtrace:
0: rust_begin_unwind
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs:593:5
1: core::panicking::panic_fmt
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/panicking.rs:67:14
2: core::panicking::panic
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/panicking.rs:117:5
3: arrow_buffer::buffer::boolean::BooleanBuffer::value
at /Users/alamb/Software/arrow-rs2/arrow-buffer/src/buffer/boolean.rs:133:9
4: arrow_buffer::buffer::null::NullBuffer::is_valid
at /Users/alamb/Software/arrow-rs2/arrow-buffer/src/buffer/null.rs:145:9
5: arrow_buffer::buffer::null::NullBuffer::is_null
at /Users/alamb/Software/arrow-rs2/arrow-buffer/src/buffer/null.rs:151:10
6: arrow_array::array::Array::is_null::{{closure}}
at /Users/alamb/Software/arrow-rs2/arrow-array/src/array/mod.rs:193:30
7: core::option::Option<T>::map
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/option.rs:1075:29
8: arrow_array::array::Array::is_null
at /Users/alamb/Software/arrow-rs2/arrow-array/src/array/mod.rs:193:9
9: arrow_array::array::Array::is_valid
at /Users/alamb/Software/arrow-rs2/arrow-array/src/array/mod.rs:210:10
10: arrow_select::take::take_bytes
at ./src/take.rs:404:16
11: arrow_select::take::take_impl
at ./src/take.rs:126:25
12: arrow_select::take::take
at ./src/take.rs:81:5
13: arrow_select::take::tests::test_take_bytes_null_indices
at ./src/take.rs:1960:17
14: arrow_select::take::tests::test_take_bytes_null_indices::{{closure}}
at ./src/take.rs:1954:39
15: core::ops::function::FnOnce::call_once
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/ops/function.rs:250:5
16: core::ops::function::FnOnce::call_once
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
|
||
#[test] | ||
fn test_take_bytes_null_indices() { | ||
let indices = Int32Array::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let indices = Int32Array::new( | |
// Build indices with values that out of bounds, but NULL so they | |
// will not be checked | |
let indices = Int32Array::new( |
if array.is_valid(index) && indices.is_valid(i) { | ||
offsets.extend(indices.values().iter().enumerate().map(|(i, index)| { | ||
let index = index.as_usize(); | ||
if indices.is_valid(i) && array.is_valid(index) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the actual fix for the bug, right? that indices.is_valid
is checked prior to array.is_valid
?
if indices.is_valid(i) && array.is_valid(index) { | |
// check index is valid before using index. The value in | |
// NULL index slots may not be within bounds of array | |
if indices.is_valid(i) && array.is_valid(index) { |
nulls = Some(null_buf.into()); | ||
} else if array.null_count() == 0 { | ||
for (i, offset) in offsets.iter_mut().skip(1).enumerate() { | ||
offsets.extend(indices.values().iter().enumerate().map(|(i, index)| { | ||
if indices.is_valid(i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if indices.is_valid(i) { | |
// check index is valid before using index. The value in | |
// NULL index slots may not be within bounds of array | |
if indices.is_valid(i) { |
Which issue does this PR close?
Closes #4576
Rationale for this change
This fixes the overflow and null checking behaviour of take_bytes. This does regress the performance slightly, but I think this is acceptable given the prior logic was incorrect.
What changes are included in this PR?
Are there any user-facing changes?