You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge?
As we work to make extracting statistics from parquet data pages more correct and performant in #10922 one thing that would be good is to have benchmark overage
Describe the solution you'd like
Add a benchmark for extracting page statistics
Describe alternatives you've considered
Add a benchmark (source) for extracting data page statistics
These are run via
cargo bench --bench parquet_statistic
In order to create a reasonable number of data page staistics, it would be good to configure the parquet writer to limit the sizez of data pages
Is your feature request related to a problem or challenge?
As we work to make extracting statistics from parquet data pages more correct and performant in #10922 one thing that would be good is to have benchmark overage
Describe the solution you'd like
Add a benchmark for extracting page statistics
Describe alternatives you've considered
Add a benchmark (source) for extracting data page statistics
These are run via
In order to create a reasonable number of data page staistics, it would be good to configure the parquet writer to limit the sizez of data pages
datafusion/datafusion/core/benches/parquet_statistic.rs
Line 75 in ece7ae5
And use https://docs.rs/parquet/latest/parquet/file/properties/struct.WriterProperties.html#method.data_page_row_count_limit to set the the limit to 1 and then send the data in row by row as we did in the test:
datafusion/datafusion/core/tests/parquet/arrow_statistics.rs
Lines 105 to 130 in d175163
Additional context
The need for a benchmark also came up in #10932
The text was updated successfully, but these errors were encountered: