Speed up range sampling. #115

bhickey · 2016-08-26T20:52:33Z

This change removes division from the rejection sampler in random_range().
Replace divisions in range sampling with bitshifting. Instead of finding a
well fitting range, we generate a number [0,2^n) and reject out of range
values.

In synthetic benchmarks, this approximately doubles the throughput.

This change removes division from the rejection sampler in random_range(). Replace divisions in range sampling with bitshifting. Instead of finding a well fitting range, we generate a number [0,2^n) and reject out of range values. In synthetic benchmarks, this approximately doubles the throughput.

dhardy · 2017-08-17T14:36:12Z

In synthetic benchmarks, this approximately doubles the throughput.

I can't duplicate this. In my tests:

~3070ns per i64 with the old algorithm
~3200ns per i64 with your algorithm
~3340ns per i64 with your algorithm, but recalculating leading_zeros() each call instead of storing an lz field
~10200ns per i64 with the old algorithm, but not storing the rejection zone field

So your algorithm most definitely is useful if we wish to cut out the extra field or where the range can't be pre-computed, but not with the benchmark you used(?).

Actually, my test was using a different range (Range::new(3i64, 134217671i64)) and type. Using Range::new(10, 10000) from your benchmark, I get ~2600ns with the old algorithm and ~6200ns with your algorithm: more than twice as slow (and slower for i32 than i64). I'm not sure why it's this bad.

pitdicker · 2017-12-12T12:47:14Z

Using bitshifts instead of a modulus improves performance.
But the usable zone is a bad approximation: depending on the range only on average 50% of the generated values pass. So there is a high change the RNG has to run multiple times, and the branch prediction becomes terrible.

Benchmarking this method is a bit more involved, because the results very much depend on how close a range is to a power of two.

I think dhardy#2, dhardy#68 and dhardy#69 help more to improve the performance of Range. Techniques are: never working on values smaller than 32 bits, using a widening multiply instead of modulus, and a sample_single function using the bitshift technique you described.

dhardy · 2017-12-12T12:49:26Z

Yes, sounds like we can close this now (still need to get that code merged of course)!

pitdicker · 2017-12-12T13:10:42Z

Thanks for the effort on this though!

bhickey mentioned this pull request Aug 26, 2016

Make Range fields public so others can impl SampleRange #114

Closed

dhardy mentioned this pull request Aug 17, 2017

impl SampleRange for user types #146

Closed

dhardy closed this Dec 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up range sampling. #115

Speed up range sampling. #115

bhickey commented Aug 26, 2016

dhardy commented Aug 17, 2017

pitdicker commented Dec 12, 2017

dhardy commented Dec 12, 2017

pitdicker commented Dec 12, 2017

Speed up range sampling. #115

Speed up range sampling. #115

Conversation

bhickey commented Aug 26, 2016

dhardy commented Aug 17, 2017

pitdicker commented Dec 12, 2017

dhardy commented Dec 12, 2017

pitdicker commented Dec 12, 2017