Render floating point numbers using faster method #5188

danpat · 2018-09-03T05:10:41Z

When generating a JSON response string, osrm-routed is currently using whichever stdlib it's linked against to output numbers using the standard ostream << some_number method.

For API responses with not too many numbers (e.g. /viaroute with geometries=polyline), this works just fine. For responses that contain large numbers of floating point numbers (e.g. very large /table requests), the total response time can include a significant amount of time just spent turning double numbers into strings.

This PR replaces the dependence of ostream operator << (double) with a custom floating pointer-to-string renderer lifted from https://github.com/miloyip/dtoa-benchmark/blob/c4020c62754950d38a1aaaed2975b05b441d1e7d/src/milo/dtoa_milo.h.

The github README for the dtoa-benchmark project claims that this code is about 20x faster than ostringstream operator << (double). Local benchmarking I did pretty much confirms this.

As far as I can tell, the output values are identical to the stdlib strings, so this is basically just a drop-in replacement with a free performance boost. For responses that contain lots of doubles, this speeds things along nicely.

Tasklist

CHANGELOG.md entry (How to write a changelog entry)
update relevant Wiki pages
add tests (see testing documentation
review
adjust for comments
cherry pick to release branch

Requirements / Relations

Link any requirements here. Other pull requests this PR is based on?

kevinkreiser · 2018-09-04T18:41:05Z

@danpat i'm glad to see my suggestion worked out! i knew we could count on the rapidjson authors implementation to be webscale :p

emiltin · 2018-09-05T09:12:55Z

As far as I understand, the code implements Grisu2, which produces a correct, but not always optimal result. E.g. you might get 0.21000000000000002 instead of 0.21 for certain values. From the paper at https://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf:

Grisu2 may be used where an optimal decimal representation
is desired but not required. Given a 64 bit unsigned integer Grisu2
computes the optimal decimal representation 99.8% of the times
for IEEE doubles.

Grisu3 should be used when the optimal representation is required.
In this case Grisu3 will efficiently produce the optimal result
for 99.5% of its input (with doubles and 64-bit integers), and
reject the remaining numbers. The rejected numbers must then be
printed by a more accurate (but slower) algorithm. Since Grisu3 is
about 4 times faster than these alternative algorithms the average
speed-up is still significant.

Errol3 is a newer method, but it seems their initial speed claims were wrong: https://github.com/marcandrysco/Errol
https://cseweb.ucsd.edu/~mandrysc/pub/dtoa.pdf

Another new method called Ryu claim to be faster than Grisu:
http://delivery.acm.org/10.1145/3200000/3192369/pldi18main-p10-p.pdf?ip=193.169.154.217&id=3192369&acc=OA&key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E66AE2C137E5E05CA&__acm__=1536137534_a1b1064519104e8c642475ebcfb36057
https://github.com/ulfjack/ryu

danpat · 2018-09-05T19:34:23Z

@emiltin haha, more than I ever wanted to know about decimal number formatting :-)

On macOS, out << some_doubler seems to fall through to snprintf() from the C standard library.

The C99 standard, section 5.2.4.2.2 Paragraph 6 doc here says:

The accuracy of the floating-point operations (+, -, *, /) and of the library functions in
<math.h> and <complex.h> that return floating-point results is implementationdefined,
as is the accuracy of the conversion between floating-point internal
representations and string representations performed by the library functions in
<stdio.h>, <stdlib.h>, and <wchar.h>. The implementation may state that the
accuracy is unknown.

So given that there is no strict requirement from the standard about how these numbers come out, and all our tests are passing, I'm inclined to say that any change in the number formatting that folks experience after this PR goes out was previously undefined behaviour anyway.

ghoshkaj · 2018-09-05T19:52:18Z

include/util/json_renderer.hpp

@@ -34,8 +35,31 @@ struct Renderer

    void operator()(const Number &number) const
    {
-        out.precision(10);
-        out << number.value;
+        char buffer[256] = {'\0'};


Why did you choose 256 as the total length of the number here?

…ad of stdlib to speed up JSON generation.

- Changes from 5.18.0: - Optimizations: - CHANGED: Use Grisu2 for serializing floating point numbers. [Project-OSRM#5188](Project-OSRM#5188) - ADDED: Node bindings can return pre-rendered JSON buffer. [Project-OSRM#5189](Project-OSRM#5189) - Profiles: - CHANGED: Bicycle profile now blacklists barriers instead of whitelisting them [Project-OSRM#5076 ](Project-OSRM#5076) - CHANGED: Foot profile now blacklists barriers instead of whitelisting them [Project-OSRM#5077 ](Project-OSRM#5077) - CHANGED: Support maxlength and maxweight in car profile [Project-OSRM#5101](Project-OSRM#5101] - Bugfixes: - FIXED: collapsing of ExitRoundabout instructions [Project-OSRM#5114](Project-OSRM#5114) - Misc: - CHANGED: Support up to 512 named shared memory regions [Project-OSRM#5185](Project-OSRM#5185)

danpat force-pushed the danpat_speedup_double_rendering branch 2 times, most recently from 3f5e961 to f894507 Compare September 5, 2018 19:26

ghoshkaj reviewed Sep 5, 2018

View reviewed changes

ghoshkaj approved these changes Sep 5, 2018

View reviewed changes

ghoshkaj added this to the 5.19.0 milestone Sep 5, 2018

Render floating point numbers to string using Grisu2 algorithmt inste…

e072dc6

…ad of stdlib to speed up JSON generation.

danpat force-pushed the danpat_speedup_double_rendering branch from 253afd5 to e072dc6 Compare September 5, 2018 20:40

danpat merged commit 85515f0 into master Sep 5, 2018

danpat mentioned this pull request Sep 5, 2018

Windows builds broken due to number formatting optimizations #5198

Closed

chaupow deleted the danpat_speedup_double_rendering branch September 28, 2018 21:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Render floating point numbers using faster method #5188

Render floating point numbers using faster method #5188

danpat commented Sep 3, 2018 •

edited

Loading

kevinkreiser commented Sep 4, 2018 •

edited

Loading

emiltin commented Sep 5, 2018

danpat commented Sep 5, 2018

ghoshkaj Sep 5, 2018

Render floating point numbers using faster method #5188

Render floating point numbers using faster method #5188

Conversation

danpat commented Sep 3, 2018 • edited Loading

Tasklist

Requirements / Relations

kevinkreiser commented Sep 4, 2018 • edited Loading

emiltin commented Sep 5, 2018

danpat commented Sep 5, 2018

ghoshkaj Sep 5, 2018

Choose a reason for hiding this comment

danpat commented Sep 3, 2018 •

edited

Loading

kevinkreiser commented Sep 4, 2018 •

edited

Loading