-
Notifications
You must be signed in to change notification settings - Fork 901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eliminate duplicate allocation of nested string columns #15061
Eliminate duplicate allocation of nested string columns #15061
Conversation
…bug-read_parquet-nested-str-alloc
…m/vuule/cudf into bug-read_parquet-nested-str-alloc
Co-authored-by: Nghia Truong <7416935+ttnghia@users.noreply.github.com>
cpp/src/io/parquet/reader_impl.cpp
Outdated
@@ -281,7 +282,7 @@ void reader::impl::decode_page_data(size_t skip_rows, size_t num_rows) | |||
out_buf.user_data |= PARQUET_COLUMN_BUFFER_FLAG_LIST_TERMINATED; | |||
} else if (out_buf.type.id() == type_id::STRING) { | |||
// need to cap off the string offsets column | |||
size_type const sz = static_cast<size_type>(col_sizes[idx]); | |||
size_type const sz = static_cast<size_type>(col_string_sizes[idx]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using size_type here for now. When we get to supporting larger string columns in the reader, we'll probably use an offsetalator (or something similar) here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one missed error handling macro. Otherwise looks good.
…m/vuule/cudf into bug-read_parquet-nested-str-alloc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
/merge |
Description
Issue #14965
Checklist