Improve performance of set_bits by using copy_from_slice instead of setting individual bytes #2077

jhorstmann · 2022-07-15T12:07:49Z

Which issue does this PR close?

Closes #2060.

Rationale for this change

Using copy_from_slice here reduces the number of bounds checks to one per chunk instead of one per byte.

Performance of boolean_append_packed improves by about 40% on my laptop.

$ RUSTFLAGS="-Ctarget-cpu=skylake" perf record cargo bench --bench boolean_append_packed
boolean_append_packed   time:   [11.634 us 11.712 us 11.802 us]                                   
                        change: [-41.447% -40.956% -40.476%] (p = 0.00 < 0.05)
                        Performance has improved.

Further improvements should be possible by asserting the bounds once on function entry and then using unsafe ptr::copy_non_overlapping in the main loop and get_bit_raw/set_bit_raw for the non-aligned parts.

What changes are included in this PR?

Are there any user-facing changes?

No

…etting individual bytes

jhorstmann · 2022-07-15T12:16:06Z

@viirya FYI. Maybe you could rerun your original benchmark to decide whether this is already enough of an improvement or if we have to use some unsafe.

tustvold

Nice 👍

ursabot · 2022-07-15T15:11:09Z

Benchmark runs are scheduled for baseline = 86543a4 and contender = 474dc14. 474dc14 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

viirya

lgtm. I will run a profiling to check the performance. Thanks.

Improve performance of set_bits by using copy_from_slice instead of s…

09811e6

…etting individual bytes

github-actions bot added the arrow Changes to the arrow crate label Jul 15, 2022

tustvold approved these changes Jul 15, 2022

View reviewed changes

tustvold merged commit 474dc14 into apache:master Jul 15, 2022

viirya reviewed Jul 15, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of set_bits by using copy_from_slice instead of setting individual bytes #2077

Improve performance of set_bits by using copy_from_slice instead of setting individual bytes #2077

jhorstmann commented Jul 15, 2022

jhorstmann commented Jul 15, 2022

tustvold left a comment

ursabot commented Jul 15, 2022

viirya left a comment

Improve performance of set_bits by using copy_from_slice instead of setting individual bytes #2077

Improve performance of set_bits by using copy_from_slice instead of setting individual bytes #2077

Conversation

jhorstmann commented Jul 15, 2022

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

jhorstmann commented Jul 15, 2022

tustvold left a comment

Choose a reason for hiding this comment

ursabot commented Jul 15, 2022

viirya left a comment

Choose a reason for hiding this comment