Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use float and possibly half in json::to_cbor #1719

Closed
misos1 opened this issue Aug 22, 2019 · 24 comments · Fixed by #2044
Closed

Use float and possibly half in json::to_cbor #1719

misos1 opened this issue Aug 22, 2019 · 24 comments · Fixed by #2044
Assignees
Labels
aspect: binary formats BSON, CBOR, MessagePack, UBJSON kind: enhancement/improvement solution: proposed fix a fix for the issue has been proposed and waits for confirmation
Milestone

Comments

@misos1
Copy link

misos1 commented Aug 22, 2019

  • Describe the feature in as much detail as possible.

Similarly as with integer types why not to encode floating point types as small as possible? For example now is 0.5 encoded as FB 3F E0 00 00 00 00 00 00 but it could be F9 38 00. So if half is enough to store value without loss then store it in half. If not then test single precision and if that fails store it as double precision. One way could be to convert double to lower precision and compare whether they are equal. Or maybe this can be done by simply testing least significant bits in mantissa and exponent. If half would be too expensive due to scarce native support in hardware then at least support single precision floats.

@stale

This comment has been minimized.

@stale stale bot added the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Sep 21, 2019
@nlohmann nlohmann removed the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Sep 26, 2019
@stale

This comment has been minimized.

@stale stale bot added the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Oct 26, 2019
@stale stale bot closed this as completed Nov 2, 2019
@nlohmann nlohmann added the aspect: binary formats BSON, CBOR, MessagePack, UBJSON label Nov 2, 2019
@nlohmann nlohmann reopened this Nov 2, 2019
@stale stale bot removed the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Nov 2, 2019
@nlohmann
Copy link
Owner

nlohmann commented Nov 5, 2019

Any idea how to realize this?

@nlohmann nlohmann added the state: help needed the issue needs help to proceed label Nov 5, 2019
@rvjr
Copy link

rvjr commented Nov 18, 2019

We actually noticed this too as we needed very small binaries; we fixed it internally as in the attached patch. So far we have not found any problems with this variant. As misos1 suggested we simply test if the value stays the same if we truncate it to float precision.

0001-SDK-341-Implemented-CBOR-serialization-with-32bit-fl.zip

@nlohmann
Copy link
Owner

Thanks for sharing! I'll have a look.

@nlohmann nlohmann added solution: proposed fix a fix for the issue has been proposed and waits for confirmation and removed state: help needed the issue needs help to proceed labels Nov 18, 2019
@misos1
Copy link
Author

misos1 commented Nov 18, 2019

It is done similarly also in other libraries:

https://github.com/pyfisch/cbor/blob/master/src/ser.rs#L311-L342
https://github.com/hildjj/node-cbor/blob/master/lib/encoder.js#L203-L229

There are edge cases. NaN does not equal to itself.

But is this really the best way? What about that other suggestion by testing bits?

Would be great to also have half.

@misos1
Copy link
Author

misos1 commented Nov 18, 2019

I suppose if hardware supports such conversion then will be cheaper to convert number to lower precision and compare with original. Testing bits would always require more instructions than that. If hardware does not support it then converting forth and back would be overhead and whether is conversion lossless or not can be decided during conversion.

I was probably wrong about scarce native support of half in hardware. Seems it is rather ubiquitous.

So after checking edge cases like infinities and NaN (which can be simply stored as half) if double value stays same after conversion to float then try to serialise it as float or half (else store it as double). If there is hardware support for half and float value stays same after conversion to half and back then store it as half (else store it as float). If there is no hardware support (and we are willing to do this) convert it manually and if it looks lossless then store it as half (else store it as float). So support half at least when there is hardware support. I also noticed that some libraries are storing whole floating point numbers as integers where possible but this would mean loss of type information.

@rvjr
Copy link

rvjr commented Nov 18, 2019

You're right, special handling of Inf and NaN would be nice, because these can we written as floats without any loss of information. But I don't think supporting half makes sense at this point, because at least in x86_64 there are not even scalar instructions for converting to float. Where do you see the benefit of half, and on which architecture?

@misos1
Copy link
Author

misos1 commented Nov 18, 2019

It could be good to produce even smaller cbor binaries. It is not about architecture but more about conciseness. In x86_64 the instruction converts 4 numbers at once but this does not mean that it is 4x slower than conversion from double to float. It is SIMD. Seems also ARM has support for half. In any case conversion to half can be done also in software like is now done conversion from half (there can be some switch whether it should encode floats to halfs if there is not hardware support).

@rvjr
Copy link

rvjr commented Nov 18, 2019

Yes of course this is not a question of conversion performance. For CBOR the resulting binary size is all that matters. I just meant it's maybe a lot of work to do this correctly including detection of supported instruction set, fall back to a correct manual conversion, etc., for some very rare cases in which you gain significant size reduction...
So, feel free to implement it :-)

@nlohmann
Copy link
Owner

Let me try to summarize before I misunderstand anything:

  • We do have a patch that tries to serialize floating-point numbers to float (0xFA) rather than double (0xFB) if possible.
  • We could use a similar approach to also try to use half-precision floats (0xF9), but this would only have advantages in the size of the generated binary value, but have negative impact on the performance, as generating and parsing half-precision floats means some overhead.
  • Both approaches are similar to the way we treat integers in the sense that we try to use the most compact representation. However, for integers this (a) does not involve round-tripping, and (b) all integer types enjoy good hardware support.

Is this right?

If so, maybe adding parameters to the to_cbor function could control such conversions just like the use_size and use_type parameters in to_ubjson. What do you think?

@misos1
Copy link
Author

misos1 commented Nov 19, 2019

  1. As mentioned above that patch would probably need little modifications to treat edge cases as for example a == a is false when a is float or double NaN.

  2. It could have impact without hardware support for half. Hardware half conversion can be also used in cbor deserialisation instead of current software approach. Why not use third-party C++11 header-only library for half manipulation like for example this http://half.sourceforge.net/. It uses hardware half for conversions when possible or software fallback. Seems to enable it on clang and gcc on x86-64 it is needed to pass option -mf16c or similar.

  3. Yes integers are easier to check.

Yes, parameters to to_cbor would be good idea for better control about half behaviour. But they are not needed for double to float conversions. Also other formats like messagepack could store numbers to 32-bit float where possible.

@nlohmann
Copy link
Owner

nlohmann commented Dec 2, 2019

Thanks! I like the idea of using float where possible, but I am very hesitant to pull in a 200 KB header for half support...

@misos1
Copy link
Author

misos1 commented Dec 2, 2019

Maybe can be extracted only part of it which is doing conversions or use some another.

@stale
Copy link

stale bot commented Jan 1, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Jan 1, 2020
@stale stale bot closed this as completed Jan 8, 2020
@dota17
Copy link
Contributor

dota17 commented Mar 25, 2020

Should we discuss this issue again to prevent it from being ignored?

@rvjr
Copy link

rvjr commented Mar 25, 2020

Well I have a strong preference for the simple cast-and-compare approach. Maybe with some special handling of NaN and Inf, which could be stored as float and would otherwise be stored as double. Including another library just for this tiny feature really bloats the code footprint. And I think keeping the json library compact should be preferred over hardware support of half conversions, which is still rare and the benefit is probably negligible on almost all platforms.

@dota17
Copy link
Contributor

dota17 commented Mar 26, 2020

According to the NAN and infinity you mentioned, I looked at the code and the corresponding documentation, and got the following information:
NaN and Inf are half-precision floats, the from_cbor function can hadle these kind of type, but to_cbor function doesn't support half and single-precision floats type, that's to say, we can deserialize 0xf97c00(means Infinity) to null, but we can't serialize Infinity to 0xf97c00 directly.
So, how about adding NAN/Inf support to the to_cbor function first?
Hope I didn't miss something important.

@rvjr
Copy link

rvjr commented Mar 26, 2020

I would simply test the input value with std::isnan() and std::isinf() and serialize std::numeric_limits::quietNaN() or std::numeric_limits::infinity() into the stream. All other values can be handled as in my proposed patch, this worked fine for us for quite some time.

@dota17
Copy link
Contributor

dota17 commented Mar 26, 2020

Yes, that's good. Do you mind if I make some changes based on your patch?

@rvjr
Copy link

rvjr commented Mar 26, 2020

Feel free, that's why I attached it to this thread...

@nlohmann nlohmann reopened this Apr 10, 2020
@stale stale bot removed the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Apr 10, 2020
@dota17 dota17 mentioned this issue Apr 16, 2020
4 tasks
@nlohmann nlohmann linked a pull request Apr 18, 2020 that will close this issue
4 tasks
@stale
Copy link

stale bot commented May 10, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label May 10, 2020
@dota17
Copy link
Contributor

dota17 commented May 12, 2020

@rvjr we can't simply treat the type conversion like in your patch, because some float numbers in testcase cannot handled correctly, you can see them in this comment.
Could you please take a look at this?

@stale stale bot removed the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label May 12, 2020
@nlohmann
Copy link
Owner

I think the patch is fine - the failing tests just assume that the library always returns doubles.

@nlohmann nlohmann added this to the Release 3.8.0 milestone May 13, 2020
@nlohmann nlohmann self-assigned this May 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aspect: binary formats BSON, CBOR, MessagePack, UBJSON kind: enhancement/improvement solution: proposed fix a fix for the issue has been proposed and waits for confirmation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants