-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing loop in two-digit optimization? #85
Comments
This is a significant speedup in a few cases. name old time/op new time/op delta AppendFloat64/0e+00-12 3.31ns ± 1% 3.29ns ± 1% ~ (p=0.141 n=7+7) AppendFloat64/1e+00-12 14.7ns ± 1% 14.6ns ± 1% ~ (p=0.494 n=7+7) AppendFloat64/3e-01-12 58.7ns ± 1% 51.3ns ± 2% -12.59% (p=0.001 n=7+7) AppendFloat64/1e+06-12 21.2ns ± 0% 21.4ns ± 1% ~ (p=0.182 n=7+7) AppendFloat64/-1.2345e+02-12 58.0ns ± 1% 52.2ns ± 1% -9.95% (p=0.001 n=7+7) AppendFloat64/6.226662346353213e-309-12 48.9ns ± 0% 48.7ns ± 1% ~ (p=0.237 n=6+7) See ulfjack/ryu#85 for discussion.
This is a significant speedup in a few cases. name old time/op new time/op delta AppendFloat64/0e+00-12 3.31ns ± 1% 3.29ns ± 1% ~ (p=0.141 n=7+7) AppendFloat64/1e+00-12 14.7ns ± 1% 14.6ns ± 1% ~ (p=0.494 n=7+7) AppendFloat64/3e-01-12 58.7ns ± 1% 51.3ns ± 2% -12.59% (p=0.001 n=7+7) AppendFloat64/1e+06-12 21.2ns ± 0% 21.4ns ± 1% ~ (p=0.182 n=7+7) AppendFloat64/-1.2345e+02-12 58.0ns ± 1% 52.2ns ± 1% -9.95% (p=0.001 n=7+7) AppendFloat64/6.226662346353213e-309-12 48.9ns ± 0% 48.7ns ± 1% ~ (p=0.237 n=6+7) See ulfjack/ryu#85 for discussion.
This was my optimization and it was intentionally written as a single test instead of a loop. For random doubles, typically the “remove two digits” optimization can be applied exactly once, and testing again would degrade performance. Only numbers with a very short round-trip format, like 0.3, benefit from repeated applications. It’s a judgment call, but I chose to avoid introducing assumptions that the input is likely to be friendly to the optimization. The comments about loop iterations were profiled from random doubles. This is also why the float codepath lacks the optimization - random floats don’t typically have 2 digits to trim. |
@StephanTLavavej Please reconsider using a 2-digit loop with the 1-digit conditional step instead. This solution has the same number of division operations as the currently implemented a 2-digit conditional step with the 1-digit loop for most fast cases, but it greatly reduces them for numbers with small mantissas. Currently this weak performance comparing with the best C/C++ dtoa routines for small mantissas prevents wider adoption of the Ryu algorithm, see this issue for example. Here is how that rounding branch was implemented in Scala for jsoniter-scala. Also, here are proposed changes with benchmark results for the Rust implementation. |
I can't recall if I tried the approach of looping for 2 and testing for 1. I can profile again. I know that it helps short outputs (many digits to trim), and it looks like it should help the case where there's only 1 digit to trim, but I'm worried that it'll slow down the most common case for random doubles (exactly 2 digits to trim). Perhaps the downside is minimal. |
Hey @ulfjack, thanks for the great algorithm!
Quick question: should this be a loop rather than an if, or was that intentional?
ryu/ryu/d2s.c
Line 414 in b2ae7a7
I haven't tried building your benchmarks yet, but in my own implementation based on your C code changing this to a loop gives a noticeable speedup for formatting numbers like 0.3.
The text was updated successfully, but these errors were encountered: