Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing loop in two-digit optimization? #85

Open
cespare opened this issue Jan 13, 2019 · 3 comments
Open

Missing loop in two-digit optimization? #85

cespare opened this issue Jan 13, 2019 · 3 comments

Comments

@cespare
Copy link
Contributor

cespare commented Jan 13, 2019

Hey @ulfjack, thanks for the great algorithm!

Quick question: should this be a loop rather than an if, or was that intentional?

ryu/ryu/d2s.c

Line 414 in b2ae7a7

if (vpDiv100 > vmDiv100) { // Optimization: remove two digits at a time (~86.2%).

I haven't tried building your benchmarks yet, but in my own implementation based on your C code changing this to a loop gives a noticeable speedup for formatting numbers like 0.3.

cespare added a commit to cespare/ryu that referenced this issue Jan 13, 2019
This is a significant speedup in a few cases.

name                                     old time/op  new time/op  delta
AppendFloat64/0e+00-12                   3.31ns ± 1%  3.29ns ± 1%     ~     (p=0.141 n=7+7)
AppendFloat64/1e+00-12                   14.7ns ± 1%  14.6ns ± 1%     ~     (p=0.494 n=7+7)
AppendFloat64/3e-01-12                   58.7ns ± 1%  51.3ns ± 2%  -12.59%  (p=0.001 n=7+7)
AppendFloat64/1e+06-12                   21.2ns ± 0%  21.4ns ± 1%     ~     (p=0.182 n=7+7)
AppendFloat64/-1.2345e+02-12             58.0ns ± 1%  52.2ns ± 1%   -9.95%  (p=0.001 n=7+7)
AppendFloat64/6.226662346353213e-309-12  48.9ns ± 0%  48.7ns ± 1%     ~     (p=0.237 n=6+7)

See ulfjack/ryu#85 for discussion.
cespare added a commit to cespare/ryu that referenced this issue Jan 13, 2019
This is a significant speedup in a few cases.

name                                     old time/op  new time/op  delta
AppendFloat64/0e+00-12                   3.31ns ± 1%  3.29ns ± 1%     ~     (p=0.141 n=7+7)
AppendFloat64/1e+00-12                   14.7ns ± 1%  14.6ns ± 1%     ~     (p=0.494 n=7+7)
AppendFloat64/3e-01-12                   58.7ns ± 1%  51.3ns ± 2%  -12.59%  (p=0.001 n=7+7)
AppendFloat64/1e+06-12                   21.2ns ± 0%  21.4ns ± 1%     ~     (p=0.182 n=7+7)
AppendFloat64/-1.2345e+02-12             58.0ns ± 1%  52.2ns ± 1%   -9.95%  (p=0.001 n=7+7)
AppendFloat64/6.226662346353213e-309-12  48.9ns ± 0%  48.7ns ± 1%     ~     (p=0.237 n=6+7)

See ulfjack/ryu#85 for discussion.
@StephanTLavavej
Copy link
Contributor

This was my optimization and it was intentionally written as a single test instead of a loop. For random doubles, typically the “remove two digits” optimization can be applied exactly once, and testing again would degrade performance. Only numbers with a very short round-trip format, like 0.3, benefit from repeated applications. It’s a judgment call, but I chose to avoid introducing assumptions that the input is likely to be friendly to the optimization. The comments about loop iterations were profiled from random doubles.

This is also why the float codepath lacks the optimization - random floats don’t typically have 2 digits to trim.

@plokhotnyuk
Copy link
Contributor

plokhotnyuk commented May 25, 2019

@StephanTLavavej Please reconsider using a 2-digit loop with the 1-digit conditional step instead.

This solution has the same number of division operations as the currently implemented a 2-digit conditional step with the 1-digit loop for most fast cases, but it greatly reduces them for numbers with small mantissas.

Currently this weak performance comparing with the best C/C++ dtoa routines for small mantissas prevents wider adoption of the Ryu algorithm, see this issue for example.

Here is how that rounding branch was implemented in Scala for jsoniter-scala.

Also, here are proposed changes with benchmark results for the Rust implementation.

@StephanTLavavej
Copy link
Contributor

I can't recall if I tried the approach of looping for 2 and testing for 1. I can profile again. I know that it helps short outputs (many digits to trim), and it looks like it should help the case where there's only 1 digit to trim, but I'm worried that it'll slow down the most common case for random doubles (exactly 2 digits to trim). Perhaps the downside is minimal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants