-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fair play with JSON #335
Fair play with JSON #335
Conversation
9il
commented
May 9, 2021
•
edited
Loading
edited
- DAW Json Link use checked parsing now like other libraries (it is also a default mode in DAW)
- Rapid JSON use IEEE number parsing now (like almost all other libraries, or we need to add a note that it is inaccurate)
- Added DOM suffix for Mir libraries
- Updated README with notes about top libraries.
- Use numbers with exponents. This is an important thing. The numbers between 0 and 1 are very easy to parse precisely. The numbers with significant exponents are quite a more interesting thing to parse and libraries may use quite different approaches to do this. For example, Rapid and Mir have their own implementations, simdjson fallbacks to C's stdlib call.
2c651d3
to
fde5bb3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frankly, I'm not quite sure it's worth testing JSON precision as it's definitely not the best format if precision is important (and moreover, it's not fully compatible with IEEE 754 anyway as, for example, it doesn't have the specific values supported like negative zero). However, it could be useful information, so thank you for your efforts.
The goal isn't to change Mir position. Unlikely it will be changed, except maybe distance with Rapid will be reduced up to zero. Also, during the last decade programming languages improved number parsing and printing a lot. Ryu is a revolution. JSON is very frequently used for data transferring and serialization (there are "better" formats, but who cares). An adequate library can guarantee that it can write not a special number to JSON and read it back and got the same value. If it can't than it isn't fair. Likely the winner will be simdjson and serde. After this MR it will be clear that they are the fastest correct JSON parsing libraries. Mir needs a test with file or stream input, to show low memory consumption, but at least I can add a special note in my local readme and place a link to the benchmark, but the benchmark should be fair. I assume you may want a clever C++ configuration, I am not sure I can do it well. If you wish to preserve the unfair configs I can just add additional notes and leave them for the future. |
Isn't the memory usage already measured twice to separate the library usage from the string data. |
Yep. On the other hand, most of the high-performance libraries require a string on the input. Mir parses data by chunks of a few KB. It can read the 108 MiB JSON file and build complete Amazon's Ion DOM using only 16 MiB (so small because of symbol tables and etc). The total memory consumption would be 16 MiB, not 108+16 MiB. |
That's really cool and I know a lot of people want a feature like that for things like web servers. I thought about that path, but with mmap being on all POSIX systems and VirtualAlloc on windows, it was less effort to let the OS handle the paging of the file in those cases. It's not for all use cases, but does simplify a lot. |
Precision flags affect the performance, therefore it would be better to use them in the separate tests. I'm going to submit the PR into the original branch for that PR. |
Most use cases do not care about 0-2ulp diff from strtod, but do care about perf. Those that do, should have an option to pay for that though. |
Additional tests have been added.
Yes, I have a scheduled maintenance update on all tests, just need to finish adding the Primes tests first. I think this week I'll publish the up-to-dated numbers. |
As a side note - IEEE 754 binary float is worse than just 2 ulp diff. It's a nightmare - see e.g. vlang/v#5180 (comment) . |