Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Parquet] Use bit vector to store Prebuffered column chunk index #43007

Closed
asfimport opened this issue Jul 12, 2023 · 2 comments
Closed

Comments

@asfimport
Copy link
Collaborator

asfimport commented Jul 12, 2023

In https://issues.apache.org/jira/browse/PARQUET-2316 we allow partial buffer in parquet File Reader by storing prebuffered column chunk index in a hash set, and make a copy of this hash set for each rowgroup reader

In extreme conditions where numerous columns are prebuffered and multiple rowgroup readers are created for the same row group , the hash set would incur significant overhead. 

Using bit vector would be a reasonsable mitigation, taking 4KB for 32K columns.

Reporter: Jinpeng Zhou / @jp0317
Assignee: Jinpeng Zhou / @jp0317

PRs and other links:

Note: This issue was originally created as PARQUET-2323. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:
Issue resolved by pull request 36649
#36649

@asfimport
Copy link
Collaborator Author

Raúl Cumplido / @raulcd:
@pitrou I don't seem to have permission to update the Fix Version to cpp-13.0.0, can you help me with that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant