You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In https://issues.apache.org/jira/browse/PARQUET-2316 we allow partial buffer in parquet File Reader by storing prebuffered column chunk index in a hash set, and make a copy of this hash set for each rowgroup reader
In extreme conditions where numerous columns are prebuffered and multiple rowgroup readers are created for the same row group , the hash set would incur significant overhead.
Using bit vector would be a reasonsable mitigation, taking 4KB for 32K columns.
In https://issues.apache.org/jira/browse/PARQUET-2316 we allow partial buffer in parquet File Reader by storing prebuffered column chunk index in a hash set, and make a copy of this hash set for each rowgroup reader
In extreme conditions where numerous columns are prebuffered and multiple rowgroup readers are created for the same row group , the hash set would incur significant overhead.
Using bit vector would be a reasonsable mitigation, taking 4KB for 32K columns.
Reporter: Jinpeng Zhou / @jp0317
Assignee: Jinpeng Zhou / @jp0317
PRs and other links:
Note: This issue was originally created as PARQUET-2323. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: