-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-44072: [C++][Parquet] Add Float16 reading benchmarks #44073
Conversation
|
sizeof(c_type) * table->num_rows()); | ||
} | ||
|
||
BENCHMARK_TEMPLATE2(BM_ReadColumnPlain, false, Int32Type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit curiousity why benchmarking int32, it's to comparing with Float32?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and because the other benchmarks are dict-encoded.
From the benchmark result, the Float16 can be well optimized, I think this can be multiple times(maybe 4x) faster with some optimization ( maybe detect the FLBA length, enhance the builder, etc) |
Yes, I plan to work on #44072 |
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 9986b7b. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 5 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…#44073) Local benchmark numbers: ``` --------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------------------------------------- BM_ReadColumnPlain<false,Int32Type>/null_probability:-1 20038480 ns 20019703 ns 36 bytes_per_second=1.9512Gi/s items_per_second=523.772M/s BM_ReadColumnPlain<true,Int32Type>/null_probability:0 37114403 ns 36766588 ns 19 bytes_per_second=1.06245Gi/s items_per_second=285.198M/s BM_ReadColumnPlain<true,Int32Type>/null_probability:1 44589582 ns 44371707 ns 16 bytes_per_second=901.475Mi/s items_per_second=236.316M/s BM_ReadColumnPlain<true,Int32Type>/null_probability:50 65624754 ns 65322683 ns 11 bytes_per_second=612.345Mi/s items_per_second=160.522M/s BM_ReadColumnPlain<true,Int32Type>/null_probability:99 43072631 ns 42932582 ns 16 bytes_per_second=931.693Mi/s items_per_second=244.238M/s BM_ReadColumnPlain<true,Int32Type>/null_probability:100 36710045 ns 36475141 ns 19 bytes_per_second=1.07093Gi/s items_per_second=287.477M/s BM_ReadColumnPlain<false,Float16LogicalType>/null_probability:-1 52718868 ns 52616204 ns 12 bytes_per_second=380.111Mi/s items_per_second=199.288M/s BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:0 71273144 ns 71093105 ns 10 bytes_per_second=281.321Mi/s items_per_second=147.493M/s BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:1 80674727 ns 80358048 ns 8 bytes_per_second=248.886Mi/s items_per_second=130.488M/s BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:50 138249159 ns 137922632 ns 5 bytes_per_second=145.009Mi/s items_per_second=76.0264M/s BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:99 86938382 ns 86576176 ns 8 bytes_per_second=231.01Mi/s items_per_second=121.116M/s BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:100 74154244 ns 73984356 ns 9 bytes_per_second=270.327Mi/s items_per_second=141.729M/s ``` * GitHub Issue: apache#44072 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
Local benchmark numbers: