Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add small precision fast-path to Quo #98

Closed
aaronc opened this issue Nov 2, 2020 · 2 comments · Fixed by #114
Closed

Add small precision fast-path to Quo #98

aaronc opened this issue Nov 2, 2020 · 2 comments · Fixed by #114

Comments

@aaronc
Copy link

aaronc commented Nov 2, 2020

I've been doing some benchmarks to compare APD's performance vs other approaches we're considering (cosmos/cosmos-sdk#7772).

It seems that Quo operations perform pretty poorly compare to Add, Mul, etc.:

Benchmark/1_Ops:_apd.Decimal128.Add
Benchmark/1_Ops:_apd.Decimal128.Add-16         	 5285344	       218 ns/op
Benchmark/1_Ops:_apd.Decimal128.Sub
Benchmark/1_Ops:_apd.Decimal128.Sub-16         	 5665453	       208 ns/op
Benchmark/1_Ops:_apd.Decimal128.Mul
Benchmark/1_Ops:_apd.Decimal128.Mul-16         	 9138602	       138 ns/op
Benchmark/1_Ops:_apd.Decimal128.Quo
Benchmark/1_Ops:_apd.Decimal128.Quo-16         	  162537	      7790 ns/op

This benchmark is basically just perform a single operation on two random floating point numbers (different ones each round). I'm using a Context configured with decimal128 parameters (i.e. Precision = 34, etc.)

Is Quo inherently expected to be 40x slower than other operations or could there be a performance bottleneck?

@aaronc
Copy link
Author

aaronc commented Nov 3, 2020

So looking at the Quo code, there are a lot of for loops. It looks like it's trying to do things digit by digit. Is this algorithm designed around cases with very large precision? What if there were a fast path for precision <= 34 (decimal128, etc.)?

I benchmarked some decimal128 code based on https://software.intel.com/content/www/us/en/develop/articles/intel-decimal-floating-point-math-library.html by the way and division is comparable to multiplication there. So it must be possible for apd to have a more efficient Quo implementation for "low" precision.

@nvanbenschoten
Copy link
Member

Hi @aaronc, thanks for the report. Quo performance has improved recently (it's about 50% faster on your benchmark), but it still lags the performance of other operations by quite a margin. As you identified, this has to do with the digit-by-digit loop in Quo. The algorithm is designed around cases with very large precision and does not include any form of fast-path for low precision values. I'd like to keep this issue open to track the introduction of such a fast path.

@nvanbenschoten nvanbenschoten changed the title Poor Quo performance Add small precision fast-path to Quo Jan 29, 2022
nvanbenschoten added a commit to nvanbenschoten/apd that referenced this issue Jan 31, 2022
Fixes cockroachdb#98.

This commit speeds up `Context.Quo`, replacing its per-digit long
division with a call to `BigInt.QuoRem`. This avoids per-digit
iteration. It also allows `Context.Quo` to benefit further from the
inline fast-path of `BigInt.QuoRem` for small coefficient values that
fit in a uint64.

This is a partial revert of 1eddda3, at least in spirit. Before that
commit, `Context.Quo` did try to use `big.Int.Quo`. Unfortunately, it
was inaccurate for certain values because it was not scaling the
coefficients correctly before dividing. It tried to compensate for this
by using a very large precision (`c.Precision*2 + 8`), but this was
insufficient on its own because it was not taking the size of the values
into account. So the commit abandoned that approach.

However, that commit, which was based on the description given on the
GDA site, did demonstrate how to scale the coefficients in a way that
would permit this kind of division. Specifically, it began adjusting the
operands so that the coefficient of the dividend was greater than or
equal to the coefficient of the divisor and was also less than ten times
the coefficient of the divisor.

This commit uses the coefficient adjustment introduced in 1eddda3 to
revive the call into `BigInt.QuoRem`. With the two operands adjusted, it
now is possible to scale the coefficient of the dividend by the desired
precision and then perform a direct division on the operands.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants