Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-10837: [Rust][DataFusion] Use Vec<u8> for hash keys #8863

Closed
wants to merge 15 commits into from

Conversation

Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Dec 7, 2020

This PR is a follow up of #8765 . Here, the hashmap values for the key are converted to Vec<u8> to use as key instead.

This is a bit faster as both hashing and cloning will be faster. It will also use less additional memory than the earlier usage of the dynamic GroupByScalar values (for hash join).

[This PR]

Query 12 iteration 0 took 1315 ms
Query 12 iteration 1 took 1324 ms
Query 12 iteration 2 took 1329 ms
Query 12 iteration 3 took 1334 ms
Query 12 iteration 4 took 1335 ms
Query 12 iteration 5 took 1338 ms
Query 12 iteration 6 took 1337 ms
Query 12 iteration 7 took 1349 ms
Query 12 iteration 8 took 1348 ms
Query 12 iteration 9 took 1358 ms

[Master]

Query 12 iteration 0 took 1379 ms
Query 12 iteration 1 took 1383 ms
Query 12 iteration 2 took 1401 ms
Query 12 iteration 3 took 1406 ms
Query 12 iteration 4 took 1420 ms
Query 12 iteration 5 took 1435 ms
Query 12 iteration 6 took 1401 ms
Query 12 iteration 7 took 1404 ms
Query 12 iteration 8 took 1418 ms
Query 12 iteration 9 took 1416 ms

[This PR]

Query 1 iteration 0 took 871 ms
Query 1 iteration 1 took 866 ms
Query 1 iteration 2 took 869 ms
Query 1 iteration 3 took 869 ms
Query 1 iteration 4 took 867 ms
Query 1 iteration 5 took 874 ms
Query 1 iteration 6 took 870 ms
Query 1 iteration 7 took 875 ms
Query 1 iteration 8 took 871 ms
Query 1 iteration 9 took 869 ms

[Master]

Query 1 iteration 0 took 1189 ms
Query 1 iteration 1 took 1192 ms
Query 1 iteration 2 took 1189 ms
Query 1 iteration 3 took 1185 ms
Query 1 iteration 4 took 1193 ms
Query 1 iteration 5 took 1202 ms
Query 1 iteration 6 took 1547 ms
Query 1 iteration 7 took 1242 ms
Query 1 iteration 8 took 1202 ms
Query 1 iteration 9 took 1197 ms

FWIW, micro benchmark results for aggregate queries:

aggregate_query_no_group_by 15 12                                                                            
                        time:   [538.54 us 541.48 us 544.74 us]
                        change: [+5.4384% +6.6260% +8.2034%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  7 (7.00%) high mild
  1 (1.00%) high severe

aggregate_query_no_group_by_count_distinct_wide 15 12                                                                             
                        time:   [4.8418 ms 4.8744 ms 4.9076 ms]
                        change: [-13.890% -12.580% -11.260%] (p = 0.00 < 0.05)
                        Performance has improved.

aggregate_query_no_group_by_count_distinct_narrow 15 12                                                                             
                        time:   [2.1910 ms 2.2100 ms 2.2291 ms]
                        change: [-30.490% -28.886% -27.271%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking aggregate_query_group_by 15 12: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.1s, enable flat sampling, or reduce sample count to 50.
aggregate_query_group_by 15 12                                                                             
                        time:   [1.5905 ms 1.5977 ms 1.6054 ms]
                        change: [-18.271% -16.780% -15.396%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe

aggregate_query_group_by_with_filter 15 12                                                                            
                        time:   [788.26 us 792.05 us 795.74 us]
                        change: [-9.8088% -8.5606% -7.4141%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

Benchmarking aggregate_query_group_by_u64 15 12: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.3s, enable flat sampling, or reduce sample count to 50.
aggregate_query_group_by_u64 15 12                                                                             
                        time:   [1.8502 ms 1.8565 ms 1.8630 ms]
                        change: [+8.6203% +9.8872% +10.973%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

aggregate_query_group_by_with_filter_u64 15 12                                                                            
                        time:   [777.83 us 782.75 us 788.15 us]
                        change: [-7.5157% -6.6393% -5.6558%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

FYI @jorgecarleitao

@github-actions
Copy link

github-actions bot commented Dec 7, 2020

@Dandandan Dandandan changed the title ARROW-10837: [Rust][DataFusion] Use Vec<u8> for hash join key values ARROW-10837: [Rust][DataFusion] Use Vec<u8> for hash join keys Dec 7, 2020
@github-actions github-actions bot added the needs-rebase A PR that needs to be rebased by the author label Dec 7, 2020
@Dandandan Dandandan marked this pull request as ready for review December 7, 2020 20:32
@github-actions github-actions bot removed the needs-rebase A PR that needs to be rebased by the author label Dec 7, 2020
@Dandandan
Copy link
Contributor Author

Dandandan commented Dec 7, 2020

As a next step after this, I think it would be interesting if we can have a look at calculating the hashes on the columns instead to benefit from the columnar data layout.

Some material I found on this:

https://www.cockroachlabs.com/blog/vectorized-hash-joiner/ (simple explanation)
https://pure.uva.nl/ws/files/4321270/68049_09.pdf

Please add if you know of more/newer material!

@Dandandan Dandandan changed the title ARROW-10837: [Rust][DataFusion] Use Vec<u8> for hash join keys ARROW-10837: [Rust][DataFusion] Use Vec<u8> for hash keys Dec 8, 2020
@codecov-io
Copy link

codecov-io commented Dec 8, 2020

Codecov Report

Merging #8863 (008d3ea) into master (db94f24) will decrease coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8863      +/-   ##
==========================================
- Coverage   77.03%   77.01%   -0.03%     
==========================================
  Files         173      173              
  Lines       40090    40101      +11     
==========================================
  Hits        30885    30885              
- Misses       9205     9216      +11     
Impacted Files Coverage Δ
...ust/datafusion/src/physical_plan/hash_aggregate.rs 0.00% <ø> (ø)
rust/datafusion/src/physical_plan/hash_join.rs 0.00% <ø> (ø)
rust/datafusion/tests/dataframe.rs 0.00% <0.00%> (ø)
rust/datafusion/src/datasource/csv.rs 0.00% <0.00%> (ø)
rust/datafusion/src/datasource/memory.rs 0.00% <0.00%> (ø)
rust/datafusion/src/datasource/parquet.rs 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update db94f24...008d3ea. Read the comment docs.

@Dandandan
Copy link
Contributor Author

@jorgecarleitao

Just extended the change for hash aggregates as well.

Turns out, a good speedup as well for hash aggregate queries!

[This version]

Query 1 iteration 0 took 871 ms
Query 1 iteration 1 took 866 ms
Query 1 iteration 2 took 869 ms
Query 1 iteration 3 took 869 ms
Query 1 iteration 4 took 867 ms
Query 1 iteration 5 took 874 ms
Query 1 iteration 6 took 870 ms
Query 1 iteration 7 took 875 ms
Query 1 iteration 8 took 871 ms
Query 1 iteration 9 took 869 ms

[Master]

Query 1 iteration 0 took 1189 ms
Query 1 iteration 1 took 1192 ms
Query 1 iteration 2 took 1189 ms
Query 1 iteration 3 took 1185 ms
Query 1 iteration 4 took 1193 ms
Query 1 iteration 5 took 1202 ms
Query 1 iteration 6 took 1547 ms
Query 1 iteration 7 took 1242 ms
Query 1 iteration 8 took 1202 ms
Query 1 iteration 9 took 1197 ms

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement!

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work and optimization!

GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
This PR is a follow up of apache#8765 . Here, the hashmap values for the key are converted to `Vec<u8>` to use as key instead.

This is a bit faster as both hashing and cloning will be faster. It will also use less additional memory than the earlier usage of the dynamic `GroupByScalar` values (for hash join).

[This PR]
```
Query 12 iteration 0 took 1315 ms
Query 12 iteration 1 took 1324 ms
Query 12 iteration 2 took 1329 ms
Query 12 iteration 3 took 1334 ms
Query 12 iteration 4 took 1335 ms
Query 12 iteration 5 took 1338 ms
Query 12 iteration 6 took 1337 ms
Query 12 iteration 7 took 1349 ms
Query 12 iteration 8 took 1348 ms
Query 12 iteration 9 took 1358 ms
```

[Master]
```
Query 12 iteration 0 took 1379 ms
Query 12 iteration 1 took 1383 ms
Query 12 iteration 2 took 1401 ms
Query 12 iteration 3 took 1406 ms
Query 12 iteration 4 took 1420 ms
Query 12 iteration 5 took 1435 ms
Query 12 iteration 6 took 1401 ms
Query 12 iteration 7 took 1404 ms
Query 12 iteration 8 took 1418 ms
Query 12 iteration 9 took 1416 ms
```

[This PR]
```
Query 1 iteration 0 took 871 ms
Query 1 iteration 1 took 866 ms
Query 1 iteration 2 took 869 ms
Query 1 iteration 3 took 869 ms
Query 1 iteration 4 took 867 ms
Query 1 iteration 5 took 874 ms
Query 1 iteration 6 took 870 ms
Query 1 iteration 7 took 875 ms
Query 1 iteration 8 took 871 ms
Query 1 iteration 9 took 869 ms
```

[Master]
```
Query 1 iteration 0 took 1189 ms
Query 1 iteration 1 took 1192 ms
Query 1 iteration 2 took 1189 ms
Query 1 iteration 3 took 1185 ms
Query 1 iteration 4 took 1193 ms
Query 1 iteration 5 took 1202 ms
Query 1 iteration 6 took 1547 ms
Query 1 iteration 7 took 1242 ms
Query 1 iteration 8 took 1202 ms
Query 1 iteration 9 took 1197 ms
```

FWIW, micro benchmark results for aggregate queries:

```
aggregate_query_no_group_by 15 12
                        time:   [538.54 us 541.48 us 544.74 us]
                        change: [+5.4384% +6.6260% +8.2034%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  7 (7.00%) high mild
  1 (1.00%) high severe

aggregate_query_no_group_by_count_distinct_wide 15 12
                        time:   [4.8418 ms 4.8744 ms 4.9076 ms]
                        change: [-13.890% -12.580% -11.260%] (p = 0.00 < 0.05)
                        Performance has improved.

aggregate_query_no_group_by_count_distinct_narrow 15 12
                        time:   [2.1910 ms 2.2100 ms 2.2291 ms]
                        change: [-30.490% -28.886% -27.271%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking aggregate_query_group_by 15 12: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.1s, enable flat sampling, or reduce sample count to 50.
aggregate_query_group_by 15 12
                        time:   [1.5905 ms 1.5977 ms 1.6054 ms]
                        change: [-18.271% -16.780% -15.396%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe

aggregate_query_group_by_with_filter 15 12
                        time:   [788.26 us 792.05 us 795.74 us]
                        change: [-9.8088% -8.5606% -7.4141%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

Benchmarking aggregate_query_group_by_u64 15 12: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.3s, enable flat sampling, or reduce sample count to 50.
aggregate_query_group_by_u64 15 12
                        time:   [1.8502 ms 1.8565 ms 1.8630 ms]
                        change: [+8.6203% +9.8872% +10.973%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

aggregate_query_group_by_with_filter_u64 15 12
                        time:   [777.83 us 782.75 us 788.15 us]
                        change: [-7.5157% -6.6393% -5.6558%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
```

FYI @jorgecarleitao

Closes apache#8863 from Dandandan/key_byte_vec

Lead-authored-by: Heres, Daniel <danielheres@gmail.com>
Co-authored-by: Daniël Heres <danielheres@gmail.com>
Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants