ARROW-10837: [Rust][DataFusion] Use `Vec<u8>` for hash keys #8863

Dandandan · 2020-12-07T18:34:34Z

This PR is a follow up of #8765 . Here, the hashmap values for the key are converted to Vec<u8> to use as key instead.

This is a bit faster as both hashing and cloning will be faster. It will also use less additional memory than the earlier usage of the dynamic GroupByScalar values (for hash join).

[This PR]

Query 12 iteration 0 took 1315 ms
Query 12 iteration 1 took 1324 ms
Query 12 iteration 2 took 1329 ms
Query 12 iteration 3 took 1334 ms
Query 12 iteration 4 took 1335 ms
Query 12 iteration 5 took 1338 ms
Query 12 iteration 6 took 1337 ms
Query 12 iteration 7 took 1349 ms
Query 12 iteration 8 took 1348 ms
Query 12 iteration 9 took 1358 ms

[Master]

Query 12 iteration 0 took 1379 ms
Query 12 iteration 1 took 1383 ms
Query 12 iteration 2 took 1401 ms
Query 12 iteration 3 took 1406 ms
Query 12 iteration 4 took 1420 ms
Query 12 iteration 5 took 1435 ms
Query 12 iteration 6 took 1401 ms
Query 12 iteration 7 took 1404 ms
Query 12 iteration 8 took 1418 ms
Query 12 iteration 9 took 1416 ms

[This PR]

Query 1 iteration 0 took 871 ms
Query 1 iteration 1 took 866 ms
Query 1 iteration 2 took 869 ms
Query 1 iteration 3 took 869 ms
Query 1 iteration 4 took 867 ms
Query 1 iteration 5 took 874 ms
Query 1 iteration 6 took 870 ms
Query 1 iteration 7 took 875 ms
Query 1 iteration 8 took 871 ms
Query 1 iteration 9 took 869 ms

[Master]

Query 1 iteration 0 took 1189 ms
Query 1 iteration 1 took 1192 ms
Query 1 iteration 2 took 1189 ms
Query 1 iteration 3 took 1185 ms
Query 1 iteration 4 took 1193 ms
Query 1 iteration 5 took 1202 ms
Query 1 iteration 6 took 1547 ms
Query 1 iteration 7 took 1242 ms
Query 1 iteration 8 took 1202 ms
Query 1 iteration 9 took 1197 ms

FWIW, micro benchmark results for aggregate queries:

aggregate_query_no_group_by 15 12                                                                            
                        time:   [538.54 us 541.48 us 544.74 us]
                        change: [+5.4384% +6.6260% +8.2034%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  7 (7.00%) high mild
  1 (1.00%) high severe

aggregate_query_no_group_by_count_distinct_wide 15 12                                                                             
                        time:   [4.8418 ms 4.8744 ms 4.9076 ms]
                        change: [-13.890% -12.580% -11.260%] (p = 0.00 < 0.05)
                        Performance has improved.

aggregate_query_no_group_by_count_distinct_narrow 15 12                                                                             
                        time:   [2.1910 ms 2.2100 ms 2.2291 ms]
                        change: [-30.490% -28.886% -27.271%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking aggregate_query_group_by 15 12: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.1s, enable flat sampling, or reduce sample count to 50.
aggregate_query_group_by 15 12                                                                             
                        time:   [1.5905 ms 1.5977 ms 1.6054 ms]
                        change: [-18.271% -16.780% -15.396%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe

aggregate_query_group_by_with_filter 15 12                                                                            
                        time:   [788.26 us 792.05 us 795.74 us]
                        change: [-9.8088% -8.5606% -7.4141%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

Benchmarking aggregate_query_group_by_u64 15 12: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.3s, enable flat sampling, or reduce sample count to 50.
aggregate_query_group_by_u64 15 12                                                                             
                        time:   [1.8502 ms 1.8565 ms 1.8630 ms]
                        change: [+8.6203% +9.8872% +10.973%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

aggregate_query_group_by_with_filter_u64 15 12                                                                            
                        time:   [777.83 us 782.75 us 788.15 us]
                        change: [-7.5157% -6.6393% -5.6558%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

FYI @jorgecarleitao

github-actions · 2020-12-07T19:02:27Z

https://issues.apache.org/jira/browse/ARROW-10837

Dandandan · 2020-12-07T21:13:35Z

As a next step after this, I think it would be interesting if we can have a look at calculating the hashes on the columns instead to benefit from the columnar data layout.

Some material I found on this:

https://www.cockroachlabs.com/blog/vectorized-hash-joiner/ (simple explanation)
https://pure.uva.nl/ws/files/4321270/68049_09.pdf

Please add if you know of more/newer material!

codecov-io · 2020-12-08T18:07:46Z

Codecov Report

Merging #8863 (008d3ea) into master (db94f24) will decrease coverage by 0.02%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #8863      +/-   ##
==========================================
- Coverage   77.03%   77.01%   -0.03%     
==========================================
  Files         173      173              
  Lines       40090    40101      +11     
==========================================
  Hits        30885    30885              
- Misses       9205     9216      +11

Impacted Files	Coverage Δ
...ust/datafusion/src/physical_plan/hash_aggregate.rs	`0.00% <ø> (ø)`
rust/datafusion/src/physical_plan/hash_join.rs	`0.00% <ø> (ø)`
rust/datafusion/tests/dataframe.rs	`0.00% <0.00%> (ø)`
rust/datafusion/src/datasource/csv.rs	`0.00% <0.00%> (ø)`
rust/datafusion/src/datasource/memory.rs	`0.00% <0.00%> (ø)`
rust/datafusion/src/datasource/parquet.rs	`0.00% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update db94f24...008d3ea. Read the comment docs.

Dandandan · 2020-12-08T18:09:31Z

@jorgecarleitao

Just extended the change for hash aggregates as well.

Turns out, a good speedup as well for hash aggregate queries!

[This version]

Query 1 iteration 0 took 871 ms
Query 1 iteration 1 took 866 ms
Query 1 iteration 2 took 869 ms
Query 1 iteration 3 took 869 ms
Query 1 iteration 4 took 867 ms
Query 1 iteration 5 took 874 ms
Query 1 iteration 6 took 870 ms
Query 1 iteration 7 took 875 ms
Query 1 iteration 8 took 871 ms
Query 1 iteration 9 took 869 ms

[Master]

Query 1 iteration 0 took 1189 ms
Query 1 iteration 1 took 1192 ms
Query 1 iteration 2 took 1189 ms
Query 1 iteration 3 took 1185 ms
Query 1 iteration 4 took 1193 ms
Query 1 iteration 5 took 1202 ms
Query 1 iteration 6 took 1547 ms
Query 1 iteration 7 took 1242 ms
Query 1 iteration 8 took 1202 ms
Query 1 iteration 9 took 1197 ms

andygrove

Nice improvement!

jorgecarleitao

Good work and optimization!

@jorgecarleitao

This PR is a follow up of apache#8765 . Here, the hashmap values for the key are converted to `Vec<u8>` to use as key instead. This is a bit faster as both hashing and cloning will be faster. It will also use less additional memory than the earlier usage of the dynamic `GroupByScalar` values (for hash join). [This PR] ``` Query 12 iteration 0 took 1315 ms Query 12 iteration 1 took 1324 ms Query 12 iteration 2 took 1329 ms Query 12 iteration 3 took 1334 ms Query 12 iteration 4 took 1335 ms Query 12 iteration 5 took 1338 ms Query 12 iteration 6 took 1337 ms Query 12 iteration 7 took 1349 ms Query 12 iteration 8 took 1348 ms Query 12 iteration 9 took 1358 ms ``` [Master] ``` Query 12 iteration 0 took 1379 ms Query 12 iteration 1 took 1383 ms Query 12 iteration 2 took 1401 ms Query 12 iteration 3 took 1406 ms Query 12 iteration 4 took 1420 ms Query 12 iteration 5 took 1435 ms Query 12 iteration 6 took 1401 ms Query 12 iteration 7 took 1404 ms Query 12 iteration 8 took 1418 ms Query 12 iteration 9 took 1416 ms ``` [This PR] ``` Query 1 iteration 0 took 871 ms Query 1 iteration 1 took 866 ms Query 1 iteration 2 took 869 ms Query 1 iteration 3 took 869 ms Query 1 iteration 4 took 867 ms Query 1 iteration 5 took 874 ms Query 1 iteration 6 took 870 ms Query 1 iteration 7 took 875 ms Query 1 iteration 8 took 871 ms Query 1 iteration 9 took 869 ms ``` [Master] ``` Query 1 iteration 0 took 1189 ms Query 1 iteration 1 took 1192 ms Query 1 iteration 2 took 1189 ms Query 1 iteration 3 took 1185 ms Query 1 iteration 4 took 1193 ms Query 1 iteration 5 took 1202 ms Query 1 iteration 6 took 1547 ms Query 1 iteration 7 took 1242 ms Query 1 iteration 8 took 1202 ms Query 1 iteration 9 took 1197 ms ``` FWIW, micro benchmark results for aggregate queries: ``` aggregate_query_no_group_by 15 12 time: [538.54 us 541.48 us 544.74 us] change: [+5.4384% +6.6260% +8.2034%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 7 (7.00%) high mild 1 (1.00%) high severe aggregate_query_no_group_by_count_distinct_wide 15 12 time: [4.8418 ms 4.8744 ms 4.9076 ms] change: [-13.890% -12.580% -11.260%] (p = 0.00 < 0.05) Performance has improved. aggregate_query_no_group_by_count_distinct_narrow 15 12 time: [2.1910 ms 2.2100 ms 2.2291 ms] change: [-30.490% -28.886% -27.271%] (p = 0.00 < 0.05) Performance has improved. Benchmarking aggregate_query_group_by 15 12: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.1s, enable flat sampling, or reduce sample count to 50. aggregate_query_group_by 15 12 time: [1.5905 ms 1.5977 ms 1.6054 ms] change: [-18.271% -16.780% -15.396%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 1 (1.00%) high mild 5 (5.00%) high severe aggregate_query_group_by_with_filter 15 12 time: [788.26 us 792.05 us 795.74 us] change: [-9.8088% -8.5606% -7.4141%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 5 (5.00%) high mild 1 (1.00%) high severe Benchmarking aggregate_query_group_by_u64 15 12: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.3s, enable flat sampling, or reduce sample count to 50. aggregate_query_group_by_u64 15 12 time: [1.8502 ms 1.8565 ms 1.8630 ms] change: [+8.6203% +9.8872% +10.973%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) low mild 2 (2.00%) high mild 3 (3.00%) high severe aggregate_query_group_by_with_filter_u64 15 12 time: [777.83 us 782.75 us 788.15 us] change: [-7.5157% -6.6393% -5.6558%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe ``` FYI @jorgecarleitao Closes apache#8863 from Dandandan/key_byte_vec Lead-authored-by: Heres, Daniel <danielheres@gmail.com> Co-authored-by: Daniël Heres <danielheres@gmail.com> Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>

Dandandan added 11 commits November 24, 2020 22:17

Reduce size of key by boxing string

8bdefdf

Use Box<str>

8f1238d

Simplify using into

eb44aaa

Improve benchmarks

02d8da0

Format fixes

7d4ee70

Reduce to 16 bytes, use Box<[GroupByScalar]> for key

b28a6d4

Remove unnecessary Box around indices

9c8b8f0

Merge master

3a6127b

Use byte vec for hash key

feb13e4

Merge branch 'master' into key_byte_vec

26f479a

Fix imports

a2d441c

github-actions bot added Component: Rust - DataFusion Component: Rust labels Dec 7, 2020

Dandandan changed the title ~~ARROW-10837: [Rust][DataFusion] Use Vec<u8> for hash join key values~~ ARROW-10837: [Rust][DataFusion] Use Vec<u8> for hash join keys Dec 7, 2020

github-actions bot added the needs-rebase A PR that needs to be rebased by the author label Dec 7, 2020

Merge master

f1fc4dc

Dandandan marked this pull request as ready for review December 7, 2020 20:32

github-actions bot removed the needs-rebase A PR that needs to be rebased by the author label Dec 7, 2020

Use for hash aggregate as well

3559d97

Dandandan changed the title ~~ARROW-10837: [Rust][DataFusion] Use Vec<u8> for hash join keys~~ ARROW-10837: [Rust][DataFusion] Use Vec<u8> for hash keys Dec 8, 2020

Dandandan added 2 commits December 8, 2020 19:21

Merge branch 'master' into key_byte_vec

379c3bd

Update comments

008d3ea

Dandandan mentioned this pull request Dec 10, 2020

ARROW-10812: [Rust] Make BooleanArray not a PrimitiveArray #8842

Closed

andygrove approved these changes Dec 10, 2020

View reviewed changes

jorgecarleitao approved these changes Dec 11, 2020

View reviewed changes

jorgecarleitao closed this in 259fd71 Dec 11, 2020

alamb mentioned this pull request Jul 28, 2021

Rework GroupByHash for faster performance and support grouping by nulls apache/datafusion#790

Closed

asfimport mentioned this pull request Dec 11, 2020

[Rust] Use Vec<u8> for hash key instead #26774

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-10837: [Rust][DataFusion] Use `Vec<u8>` for hash keys #8863

ARROW-10837: [Rust][DataFusion] Use `Vec<u8>` for hash keys #8863

Dandandan commented Dec 7, 2020 •

edited

Loading

github-actions bot commented Dec 7, 2020

Dandandan commented Dec 7, 2020 •

edited

Loading

codecov-io commented Dec 8, 2020 •

edited

Loading

Dandandan commented Dec 8, 2020

andygrove left a comment

jorgecarleitao left a comment

ARROW-10837: [Rust][DataFusion] Use Vec<u8> for hash keys #8863

ARROW-10837: [Rust][DataFusion] Use Vec<u8> for hash keys #8863

Conversation

Dandandan commented Dec 7, 2020 • edited Loading

github-actions bot commented Dec 7, 2020

Dandandan commented Dec 7, 2020 • edited Loading

codecov-io commented Dec 8, 2020 • edited Loading

Codecov Report

Dandandan commented Dec 8, 2020

andygrove left a comment

Choose a reason for hiding this comment

jorgecarleitao left a comment

Choose a reason for hiding this comment

ARROW-10837: [Rust][DataFusion] Use `Vec<u8>` for hash keys #8863

ARROW-10837: [Rust][DataFusion] Use `Vec<u8>` for hash keys #8863

Dandandan commented Dec 7, 2020 •

edited

Loading

Dandandan commented Dec 7, 2020 •

edited

Loading

codecov-io commented Dec 8, 2020 •

edited

Loading