-
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use float and possibly half in json::to_cbor #1719
Comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Any idea how to realize this? |
We actually noticed this too as we needed very small binaries; we fixed it internally as in the attached patch. So far we have not found any problems with this variant. As misos1 suggested we simply test if the value stays the same if we truncate it to float precision. 0001-SDK-341-Implemented-CBOR-serialization-with-32bit-fl.zip |
Thanks for sharing! I'll have a look. |
It is done similarly also in other libraries: https://github.com/pyfisch/cbor/blob/master/src/ser.rs#L311-L342 There are edge cases. NaN does not equal to itself. But is this really the best way? What about that other suggestion by testing bits? Would be great to also have half. |
I suppose if hardware supports such conversion then will be cheaper to convert number to lower precision and compare with original. Testing bits would always require more instructions than that. If hardware does not support it then converting forth and back would be overhead and whether is conversion lossless or not can be decided during conversion. I was probably wrong about scarce native support of half in hardware. Seems it is rather ubiquitous. So after checking edge cases like infinities and NaN (which can be simply stored as half) if double value stays same after conversion to float then try to serialise it as float or half (else store it as double). If there is hardware support for half and float value stays same after conversion to half and back then store it as half (else store it as float). If there is no hardware support (and we are willing to do this) convert it manually and if it looks lossless then store it as half (else store it as float). So support half at least when there is hardware support. I also noticed that some libraries are storing whole floating point numbers as integers where possible but this would mean loss of type information. |
You're right, special handling of Inf and NaN would be nice, because these can we written as floats without any loss of information. But I don't think supporting half makes sense at this point, because at least in x86_64 there are not even scalar instructions for converting to float. Where do you see the benefit of half, and on which architecture? |
It could be good to produce even smaller cbor binaries. It is not about architecture but more about conciseness. In x86_64 the instruction converts 4 numbers at once but this does not mean that it is 4x slower than conversion from double to float. It is SIMD. Seems also ARM has support for half. In any case conversion to half can be done also in software like is now done conversion from half (there can be some switch whether it should encode floats to halfs if there is not hardware support). |
Yes of course this is not a question of conversion performance. For CBOR the resulting binary size is all that matters. I just meant it's maybe a lot of work to do this correctly including detection of supported instruction set, fall back to a correct manual conversion, etc., for some very rare cases in which you gain significant size reduction... |
Let me try to summarize before I misunderstand anything:
Is this right? If so, maybe adding parameters to the |
Yes, parameters to |
Thanks! I like the idea of using |
Maybe can be extracted only part of it which is doing conversions or use some another. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Should we discuss this issue again to prevent it from being ignored? |
Well I have a strong preference for the simple cast-and-compare approach. Maybe with some special handling of NaN and Inf, which could be stored as float and would otherwise be stored as double. Including another library just for this tiny feature really bloats the code footprint. And I think keeping the json library compact should be preferred over hardware support of half conversions, which is still rare and the benefit is probably negligible on almost all platforms. |
According to the NAN and infinity you mentioned, I looked at the code and the corresponding documentation, and got the following information: |
I would simply test the input value with std::isnan() and std::isinf() and serialize std::numeric_limits::quietNaN() or std::numeric_limits::infinity() into the stream. All other values can be handled as in my proposed patch, this worked fine for us for quite some time. |
Yes, that's good. Do you mind if I make some changes based on your patch? |
Feel free, that's why I attached it to this thread... |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I think the patch is fine - the failing tests just assume that the library always returns doubles. |
Similarly as with integer types why not to encode floating point types as small as possible? For example now is 0.5 encoded as
FB 3F E0 00 00 00 00 00 00
but it could beF9 38 00
. So if half is enough to store value without loss then store it in half. If not then test single precision and if that fails store it as double precision. One way could be to convert double to lower precision and compare whether they are equal. Or maybe this can be done by simply testing least significant bits in mantissa and exponent. If half would be too expensive due to scarce native support in hardware then at least support single precision floats.The text was updated successfully, but these errors were encountered: