Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Render floating point numbers using faster method #5188

Merged
merged 1 commit into from
Sep 5, 2018

Conversation

danpat
Copy link
Member

@danpat danpat commented Sep 3, 2018

When generating a JSON response string, osrm-routed is currently using whichever stdlib it's linked against to output numbers using the standard ostream << some_number method.

For API responses with not too many numbers (e.g. /viaroute with geometries=polyline), this works just fine. For responses that contain large numbers of floating point numbers (e.g. very large /table requests), the total response time can include a significant amount of time just spent turning double numbers into strings.

This PR replaces the dependence of ostream operator << (double) with a custom floating pointer-to-string renderer lifted from https://github.com/miloyip/dtoa-benchmark/blob/c4020c62754950d38a1aaaed2975b05b441d1e7d/src/milo/dtoa_milo.h.

The github README for the dtoa-benchmark project claims that this code is about 20x faster than ostringstream operator << (double). Local benchmarking I did pretty much confirms this.

As far as I can tell, the output values are identical to the stdlib strings, so this is basically just a drop-in replacement with a free performance boost. For responses that contain lots of doubles, this speeds things along nicely.

Tasklist

Requirements / Relations

Link any requirements here. Other pull requests this PR is based on?

@kevinkreiser
Copy link
Member

kevinkreiser commented Sep 4, 2018

@danpat i'm glad to see my suggestion worked out! i knew we could count on the rapidjson authors implementation to be webscale :p

@emiltin
Copy link
Contributor

emiltin commented Sep 5, 2018

As far as I understand, the code implements Grisu2, which produces a correct, but not always optimal result. E.g. you might get 0.21000000000000002 instead of 0.21 for certain values. From the paper at https://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf:

Grisu2 may be used where an optimal decimal representation
is desired but not required. Given a 64 bit unsigned integer Grisu2
computes the optimal decimal representation 99.8% of the times
for IEEE doubles.

Grisu3 should be used when the optimal representation is required.
In this case Grisu3 will efficiently produce the optimal result
for 99.5% of its input (with doubles and 64-bit integers), and
reject the remaining numbers. The rejected numbers must then be
printed by a more accurate (but slower) algorithm. Since Grisu3 is
about 4 times faster than these alternative algorithms the average
speed-up is still significant.

Errol3 is a newer method, but it seems their initial speed claims were wrong: https://github.com/marcandrysco/Errol
https://cseweb.ucsd.edu/~mandrysc/pub/dtoa.pdf

Another new method called Ryu claim to be faster than Grisu:
http://delivery.acm.org/10.1145/3200000/3192369/pldi18main-p10-p.pdf?ip=193.169.154.217&id=3192369&acc=OA&key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E66AE2C137E5E05CA&__acm__=1536137534_a1b1064519104e8c642475ebcfb36057
https://github.com/ulfjack/ryu

@danpat danpat force-pushed the danpat_speedup_double_rendering branch 2 times, most recently from 3f5e961 to f894507 Compare September 5, 2018 19:26
@danpat
Copy link
Member Author

danpat commented Sep 5, 2018

@emiltin haha, more than I ever wanted to know about decimal number formatting :-)

On macOS, out << some_doubler seems to fall through to snprintf() from the C standard library.

The C99 standard, section 5.2.4.2.2 Paragraph 6 doc here says:

The accuracy of the floating-point operations (+, -, *, /) and of the library functions in
<math.h> and <complex.h> that return floating-point results is implementationdefined,
as is the accuracy of the conversion between floating-point internal
representations and string representations performed by the library functions in
<stdio.h>, <stdlib.h>, and <wchar.h>. The implementation may state that the
accuracy is unknown.

So given that there is no strict requirement from the standard about how these numbers come out, and all our tests are passing, I'm inclined to say that any change in the number formatting that folks experience after this PR goes out was previously undefined behaviour anyway.

@@ -34,8 +35,31 @@ struct Renderer

void operator()(const Number &number) const
{
out.precision(10);
out << number.value;
char buffer[256] = {'\0'};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you choose 256 as the total length of the number here?

@ghoshkaj ghoshkaj added this to the 5.19.0 milestone Sep 5, 2018
@danpat danpat force-pushed the danpat_speedup_double_rendering branch from 253afd5 to e072dc6 Compare September 5, 2018 20:40
@danpat danpat merged commit 85515f0 into master Sep 5, 2018
@chaupow chaupow deleted the danpat_speedup_double_rendering branch September 28, 2018 21:41
datendelphin added a commit to fossgis-routing-server/osrm-backend that referenced this pull request Nov 19, 2020
  - Changes from 5.18.0:
    - Optimizations:
      - CHANGED: Use Grisu2 for serializing floating point numbers. [Project-OSRM#5188](Project-OSRM#5188)
      - ADDED: Node bindings can return pre-rendered JSON buffer. [Project-OSRM#5189](Project-OSRM#5189)
    - Profiles:
      - CHANGED: Bicycle profile now blacklists barriers instead of whitelisting them [Project-OSRM#5076
](Project-OSRM#5076)
      - CHANGED: Foot profile now blacklists barriers instead of whitelisting them [Project-OSRM#5077
](Project-OSRM#5077)
      - CHANGED: Support maxlength and maxweight in car profile [Project-OSRM#5101](Project-OSRM#5101]
    - Bugfixes:
      - FIXED: collapsing of ExitRoundabout instructions [Project-OSRM#5114](Project-OSRM#5114)
    - Misc:
      - CHANGED: Support up to 512 named shared memory regions [Project-OSRM#5185](Project-OSRM#5185)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants