Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear string-tracking hash table when ByteView deduplication is enabled #6385

Merged

Conversation

shanesveller
Copy link
Contributor

Which issue does this PR close?

Closes #6384.

Rationale for this change

The panic observed in #6384 was within StringViewBuilder::get_value, because the string_tracker hash table was not reset as part of finish():

pub fn finish(&mut self) -> GenericByteViewArray<T> {
self.flush_in_progress();
let completed = std::mem::take(&mut self.completed);
let len = self.views_builder.len();
let views = ScalarBuffer::new(self.views_builder.finish(), 0, len);
let nulls = self.null_buffer_builder.finish();
// SAFETY: valid by construction
unsafe { GenericByteViewArray::new_unchecked(views, completed, nulls) }
}

If the builder saw continued use, and append_value was called with a previously-observed value, the hash table would return an index to a now-emptied buffer after slicing into it here:

pub fn get_value(&self, index: usize) -> &[u8] {
let view = self.views_builder.as_slice().get(index).unwrap();
let len = *view as u32;

What changes are included in this PR?

Builders will now clear the string_tracker hash-table as part of a revised finish definition.

Are there any user-facing changes?

Not to my knowledge.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Sep 11, 2024
@shanesveller
Copy link
Contributor Author

ℹ️ This is my first PR to the Arrow repository as well as to any Apache repository. I've reviewed the CONTRIBUTING documents as well as I can after watching day 1 of RustConf, but if I've overlooked anything or need to make any adjustments to the implementation, commit message, etc. to align with the project standards, I'm very happy to do so.

@@ -590,6 +593,20 @@ mod tests {
assert_eq!(array.views().get(1), array.views().get(5));
}

#[test]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let me know if a regression test isn't needed for a simple change like this, I'm fine with dropping it if that's preferred.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regression tests are always appreciated and we try to make then required.

Thank you for doing this @shanesveller

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good to me, thank you

@alamb
Copy link
Contributor

alamb commented Sep 13, 2024

FYI @XiangpengHao

@XiangpengHao
Copy link
Contributor

look good to me, nice catch!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -590,6 +593,20 @@ mod tests {
assert_eq!(array.views().get(1), array.views().get(5));
}

#[test]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regression tests are always appreciated and we try to make then required.

Thank you for doing this @shanesveller

@alamb alamb merged commit ba85fa3 into apache:master Sep 13, 2024
25 checks passed
@alamb
Copy link
Contributor

alamb commented Sep 13, 2024

ℹ️ This is my first PR to the Arrow repository as well as to any Apache repository.

Welcome and thank you

@shanesveller shanesveller deleted the shanesveller/string-view-dedup-finish branch September 13, 2024 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StringViewBuilder with deduplication does not clear observed values
4 participants