Missing loop in two-digit optimization? #85

cespare · 2019-01-13T02:49:18Z

Hey @ulfjack, thanks for the great algorithm!

Quick question: should this be a loop rather than an if, or was that intentional?

Line 414 in b2ae7a7

    
           if (vpDiv100 > vmDiv100) { // Optimization: remove two digits at a time (~86.2%).

I haven't tried building your benchmarks yet, but in my own implementation based on your C code changing this to a loop gives a noticeable speedup for formatting numbers like 0.3.

This is a significant speedup in a few cases. name old time/op new time/op delta AppendFloat64/0e+00-12 3.31ns ± 1% 3.29ns ± 1% ~ (p=0.141 n=7+7) AppendFloat64/1e+00-12 14.7ns ± 1% 14.6ns ± 1% ~ (p=0.494 n=7+7) AppendFloat64/3e-01-12 58.7ns ± 1% 51.3ns ± 2% -12.59% (p=0.001 n=7+7) AppendFloat64/1e+06-12 21.2ns ± 0% 21.4ns ± 1% ~ (p=0.182 n=7+7) AppendFloat64/-1.2345e+02-12 58.0ns ± 1% 52.2ns ± 1% -9.95% (p=0.001 n=7+7) AppendFloat64/6.226662346353213e-309-12 48.9ns ± 0% 48.7ns ± 1% ~ (p=0.237 n=6+7) See ulfjack/ryu#85 for discussion.

StephanTLavavej · 2019-01-13T09:55:05Z

This was my optimization and it was intentionally written as a single test instead of a loop. For random doubles, typically the “remove two digits” optimization can be applied exactly once, and testing again would degrade performance. Only numbers with a very short round-trip format, like 0.3, benefit from repeated applications. It’s a judgment call, but I chose to avoid introducing assumptions that the input is likely to be friendly to the optimization. The comments about loop iterations were profiled from random doubles.

This is also why the float codepath lacks the optimization - random floats don’t typically have 2 digits to trim.

plokhotnyuk · 2019-05-25T05:21:10Z

@StephanTLavavej Please reconsider using a 2-digit loop with the 1-digit conditional step instead.

This solution has the same number of division operations as the currently implemented a 2-digit conditional step with the 1-digit loop for most fast cases, but it greatly reduces them for numbers with small mantissas.

Currently this weak performance comparing with the best C/C++ dtoa routines for small mantissas prevents wider adoption of the Ryu algorithm, see this issue for example.

Here is how that rounding branch was implemented in Scala for jsoniter-scala.

Also, here are proposed changes with benchmark results for the Rust implementation.

StephanTLavavej · 2019-05-26T05:00:16Z

I can't recall if I tried the approach of looping for 2 and testing for 1. I can profile again. I know that it helps short outputs (many digits to trim), and it looks like it should help the case where there's only 1 digit to trim, but I'm worried that it'll slow down the most common case for random doubles (exactly 2 digits to trim). Perhaps the downside is minimal.

cespare mentioned this issue Jan 13, 2019

Add github.com/cespare/ryu to list of implementations #86

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing loop in two-digit optimization? #85

Missing loop in two-digit optimization? #85

cespare commented Jan 13, 2019

StephanTLavavej commented Jan 13, 2019

plokhotnyuk commented May 25, 2019 •

edited

Loading

StephanTLavavej commented May 26, 2019

Missing loop in two-digit optimization? #85

Missing loop in two-digit optimization? #85

Comments

cespare commented Jan 13, 2019

StephanTLavavej commented Jan 13, 2019

plokhotnyuk commented May 25, 2019 • edited Loading

StephanTLavavej commented May 26, 2019

plokhotnyuk commented May 25, 2019 •

edited

Loading