Use SmallVec to optimize for small integer sizes. #210
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a proof of concept. I'd like to discuss possible designs and pathways to getting something like this merged.
Motivation
Using
BigInt
with mostly small (<= word-sized) integers is prohibitively expensive. One numerically intense algorithm from rgeometry is 1000 times slower when usingBigRational
instead of a fixed-precision number.Performance
num-bigint
performs quite well for large input and is competitive with the GMP. Detailed benchmarks can be found here: https://github.com/Lemmih/num-criterionSo why do small integers perform so poorly? Mostly because allocating new memory is significantly more expensive than doing a single addition or multiplication. There are also a few cases where we can use specialized algorithms. For example, there's a particularly fast
gcd
algorithm that only works on word-sized integers.SmallVec with two inlined BigDigits
BigUint
is aVec<BigDigit>
. As such, it has a size of 3 words in addition to the actual digits. Switching to aSmallVec<[BigDigit; 2]>
will not increase the size of aBigUint
but allows for two BigDigits to be inlined and thus not require a heap allocation.Summary of benchmarks:
I've run benchmarks on an M1 Apple MacBook Air and a 3950X AMD desktop. The results I get from these platforms are quite different and I'd love it if other people could help out by running the benchmark on their machines. For instructions, see: https://github.com/Lemmih/num-criterion
shr_assign
. The previous version interacted poorly withSmallVec
.Other optimizations
Specialize for integers that fit in 128 bits.
Ideally, addition, multiplication, and division for integers that fit in 128 bits would be as fast as cloning. Copying the bytes is significantly slower than doing the arithmetic.
Use better GCD implementation.
The GCD implementation in
num-integer
is fairly slow and can be significantly improved by changing a line or two. On my system, this leads to a 10x speed improvement for BigInts when the integer fits in 64 bits.Other oddities
num-bigint
? Aren't we all just calling 'memcpy'?