Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue72customdtoa #97

Merged
merged 5 commits into from
Aug 11, 2014
Merged

Issue72customdtoa #97

merged 5 commits into from
Aug 11, 2014

Conversation

miloyip
Copy link
Collaborator

@miloyip miloyip commented Aug 9, 2014

Custom dtoa()

After two weeks of research and implementation, a custom dtoa() was implemented to replace sprintf() in Writer::Double(). A performance benchmark can be checked in another repository dtoa-benchmark.

Simply put, the custom dtoa() is an optimized header-only C++ implementation of Grisu2 algorithm (~400 lines of code). It always generate correct-rounding output. And it can generate shortest representation for >99.9% of input. sprintf() cannot output shortest representation.

Also, it fixes the local issue in #72, always generate valid JSON number text.

Removal of precision settings

Originally, Writer::SetPrecision() et al. APIs are added because sprintf("%g) has default precision of 6 decimal digits, that may lose precision in source values (thank to @pah 's effort in #19). The new dtoa() implementation makes this not necessary as it always generate output that can be convertible back to the source values, and it will try to make the output as short as possible.

So I think those APIs can be removed. A drawback is that user cannot reduce the precision in output as before.

Better number parsing

During the implementation, it is found that Reader::ParseNumber() cannot parse double correctly. After some research, the current "naive" implementation can be slightly modified to implement the fast path conversion. It may increase a little bit overhead by using division in half of time, but at the same time it cut down the lookup table in internal::Pow10() by half. By this modification, normal ranges of numbers can be converted roundtrip exactly.

Performance

During the benchmark, it is found that gcc (glib)'s sprintf(..., "%.17g") is very very slow, compared to VS2013 (Check this and this). I have not investigated the reasons behind at the moment.

Anyway, new dtoa() implementation is much faster.

Before:

VC2013 x64 
[       OK ] RapidJson.ReaderParseIterativeInsitu_DummyHandler_SSE42 (663 ms)
[       OK ] RapidJson.Writer_NullStream (325 ms)
[       OK ] RapidJson.Writer_StringBuffer (1020 ms)
[       OK ] RapidJson.PrettyWriter_StringBuffer (1420 ms)

Cygwin GCC x64
[       OK ] RapidJson.ReaderParseIterativeInsitu_DummyHandler_SSE42 (875 ms)
[       OK ] RapidJson.Writer_NullStream (5201 ms)
[       OK ] RapidJson.Writer_StringBuffer (5673 ms)
[       OK ] RapidJson.PrettyWriter_StringBuffer (6169 ms)

After:

VC2013 x64 After
[       OK ] RapidJson.ReaderParseIterativeInsitu_DummyHandler_SSE42 (661 ms)
[       OK ] RapidJson.Writer_NullStream (257 ms)
[       OK ] RapidJson.Writer_StringBuffer (697 ms)
[       OK ] RapidJson.PrettyWriter_StringBuffer (1037 ms)

Cygwin GCC x64 After
[       OK ] RapidJson.ReaderParseIterativeInsitu_DummyHandler_SSE42 (656 ms)
[       OK ] RapidJson.Writer_NullStream (250 ms)
[       OK ] RapidJson.Writer_StringBuffer (686 ms)
[       OK ] RapidJson.PrettyWriter_StringBuffer (1144 ms)

The JSON in these tests contains some JSON numbers, not extreme cases that most values are floating-point numbers. So these results only show part of improvement. Even this is not an extreme case, in NullStream tests VC2013 shows ~1.25x speedup, while gcc shows ~20x speedup.

Accurate rounding in normal numerical ranges, also reduce lookup table
size.
Modified from Milo's Grisu2 implementation. 99.9% cases return shortest
decimal format.
@pah
Copy link
Contributor

pah commented Aug 10, 2014

Nice work! And I agree on the removal of the precision APIs. 👍

uint32_t t = (i + 1) * 1233 >> 12;
#elif __GNUC__
uint32_t t = (32 - __builtin_clz(n | 1)) * 1233 >> 12;
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we have an #else case here as well? At least with an #error?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.
I will add a standard C++ implementation.

It is simple and pure C++. And it is found in performance test that it
is even faster than the original version, due to distribution of n. But
the performance gain is not obvious in RapidJSON.
miloyip added a commit that referenced this pull request Aug 11, 2014
@miloyip miloyip merged commit adb3974 into master Aug 11, 2014
pah added a commit to pah/rapidjson that referenced this pull request Aug 11, 2014
This drops #3 and #4, as their functionality has been superseded
upstream, see Tencent/rapidjson#97 and Tencent/rapidjson#101.

Conflicts:
	include/rapidjson/prettywriter.h
	include/rapidjson/reader.h
	include/rapidjson/writer.h
@pah
Copy link
Contributor

pah commented Sep 23, 2014

See in the description above:

The new dtoa() implementation makes this not necessary as it always generate output that can be convertible back to the source values, and it will try to make the output as short as possible.

Do you really need explicitly lossy output?
Maybe you can round the values accordingly before writing them?

@gidantribal
Copy link

Yes, indeed I understood the reason behind the functionality on your pull request has been obsoleted. Actually the precision of our outputted doubles depends on the currency, thus should be possible to control it. I will push to remove this formatting limitation from the REST APIs, otherwise I'll use a custom Writer or a pre-rounding of the values for presenting them "nicely", as you suggested. Thanks a lot!

@pah
Copy link
Contributor

pah commented Sep 25, 2014

It might be fairly simple to limit the number of "fraction digits" printed in Prettify (defined in include/rapidjson/internal/dtoa.h:352).

Forcing trailing zeroes up to the requested number of fractional digits could be done as well.

Limiting the total number of printed significant digits would be less useful, I guess.
Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants