-
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CBOR byte string support #1129
Comments
How would you store the data coming from CBOR? |
my thought was creation of some byte string type, leaving existing treatment of strings alone. i'll have time this week to show in more detail |
So you mean the byte string's bytes should just be parsed and stored in a JSON string? What about encodings? Please also have a look at the issues I referenced above. |
@nlohmann How about adding the suffix _BIN or _BINARY to the key string and the string then contains the byte string as a base64 converted string? |
What about an array of binary strings? What about an array of values of which some are binary strings? How would roundtripping work? |
Any news on this? |
@lusitanian @nifker Any input on this? |
My approach would be still using a prefix/suffix for the key and for the data itself base32768 or base64 - but I dont what performance base32768 has or usable it is.
The prefix/suffix is trimmed from the key string and the resulting string is used as key in CBOR. |
I do not like this approach, as it relies on the convention to use and respect such prefixes/suffixes. |
Couldn't imagine how else you want to detect that. |
There are two aspects:
It would be nice to hear more alternatives to these two aspects. But since the library needs to be maintained and support is very time-consuming, I am against an approach that relies on non-standard assumptions or conventions that need documentation and may often lead to surprises. I'm setting this issue to discussion, and I hope to see some ideas. I am currently not planning to work on this topic, so pull requests are, as always, more than welcome! As an additional challenge, we should not forget the library also implements MessagePack and UBJSON, so ideas that also have those binary formats in mind are appreciated. |
Any opinion on this? |
How about adding a data type for binary blobs? std::vector<uint8_t> myData{0,1,2,3,4,5,6};
json j = {
{"binary", json::bin(myData.begin(), myData.end())}
};
auto v_msgpack = json::to_msgpack(j);
json j2 = json::from_msgpack(v_msgpack);
assert(j2["binary"].is_binary() == true);
auto dat = j2["binary"].get<std::string>(); // OK. Returns a string that holds the bytes (json::bin can just be a tagged struct with a std::string member)
auto str = j.dump(4); // base64 encodes binary fields
auto j3 = json::parse(str);
assert(j3["binary"].is_binary() == false); // binary data serialized to json will not survive round trip. instead it's now a base64 encoded string |
Hm. Nice idea. Any opinions on this? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
As someone who wants to use MessagePack with binary blobs, I quite like @alexkarpenko's suggestion of a separate type. It's nice and explicit and doesn't rely on developers sticking to conventions. One point I'm not completely sold on is automatically converting it to base64 string when serialising to JSON. I personally consider adding binary to an object and then trying to serialise it into JSON a developer error that should probably be an exception (if you want to use binary you should serialise to a format that supports it), but I don't hold that opinion too strongly either. Another potential reason against base64-encoding is.. well.. does this library already have a base64 encoder? We probably don't want to add one just for that! :) I don't have the time or expertise to complete this myself but I'm open to the possibility of contributing financially to see this done. |
I don't feel strongly about what to do when serializing binary blobs to JSON. Throwing an exception seems reasonable to me. |
We also would really like to have CBOR byte string support to be able to import large binary BLOBs without additional overhead. |
Is it possible to reopen this ticket please? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Hi, any update on this topic? @OmnipotentEntity, how is your implementation going? I'm afraid I don't have time at the moment to understand the internals of nlohmann/json, I could maybe give a hand if there are some very specific parts to implement.. |
Sorry, most of the last few weeks have been eaten by moving house. Once I get my computer set up again, I can upload what I have to my GitHub, if you want to take a crack at it. I'm still intending on finishing this, I'm just slow due to real life reasons. |
Great, thanks for feedback! No hurry on my side, when you have something I will be pleased to be a beta-tester.. :) |
Hi, I'm back, I finally got my office set up. Please be patient with me while I begin a new job. I had a question about design. How do you want to handle serialization of a binary value to text JSON? There's three possibilities to my mind:
These have the following pros and cons to my mind: Refuse to serialize. Pros: Logically makes sense. There is no valid JSON serialization that is reversible. Cons: Now we have a new throwing path that wasn't there. It may seem surprising that we can serialize to binary formats but not text. And this library was originally about serializing JSON. Serialize to array of numbers. Pros: This is what some binary formats will require anyway. A good compromise. Cons: Not reversible. Serialize to base64 string. Pros: Same as previous. And can devise a tag of some kind to automatically change format back to binary. Cons: Small chance that normal data matches the tagging system with surprising consequences. Extend JSON. Pros: Reversible and unambiguous. Allowed to accept non-JSON spec conforming data per RFC. Cons: This extension would need to be essentially invented (as I cannot find any literature on this topic), and no one else will support it in the short term, and very few at best will support it in the medium to long term. Not allowed to generate non-JSON spec conforming data per RFC. All of the above. Pros: Obviously if we do everything the user desires nothing can possibly be bad, right? Cons: Lots of work to implement. Need to decide default behavior. Documentation might become confusing, and raises the required knowledge for usage of the library. I'm reaching the point where I'd have to make a decision on this. And while discussion on this topic has occurred in this thread. No real consensus was reached. So I'm looking for direction. |
Refuse to serialize (throw an exception?) by default, but add a hook where you can plug in your own, custom adapter which will produce JSON compatible datatypes out of the binary datatype. Since not all binary data blobs are the same, it could be useful if the binary data container could, by itself, carry some meta information, such as a pointer to a specific callback function for converting that specific data blob to a JSON-compatible type, or a type hint as a string (relayed to the hook callback function as a parameter), or something to that effect. Also, if the data is to be serialized as a named field in a JSON object, maybe the callback could be allowed to modify the name of the field in such cases. (Could be used e.g. for adding a suffix denoting the type as suggested by someone else above. Would allow for many different schemes.) If you can think of a method of adding similar custom hooks for deserializing from plain JSON data to binary blobs (allowing round trips when serializing and deserializing binary data into a custom JSON-compatible representation), that would be most excellent as well, but maybe not as important at this stage. |
More food for thought: maybe you could use |
Hi OE, welcome back! |
We absolutely cannot store a raw pointer in serialized data. The pointer to that function isn't guaranteed to be the same from invocation to invocation on the same computer, let alone different computers with different operating systems and different processor architectures. A function identifier might be used, but as this is a library, not an application, and this would need to be something provided by the application anyway, I fail to see what we would gain exactly. Python's builtin json type does have this functionality (callback hooks); however, their json type is also read-only, and Python is itself a weakly typed language, so a polymorphic return type will not cause extra boilerplate. Putting the ability to store arbitrary types into json in this library would be quite a bit of work and require answering a bunch of difficult questions:
@mdenna-nviso my only real reservation about this approach is that it's a new throwing path. Meaning that current users of the library may run into an issue where code that had been working perfectly now throws. Currently the serializer only throws in two cases, both related to invalid UTF-8 strings, so it's safe to use without a catch if you either do not have strings, or you know the strings are valid UTF-8 (such as if you generated them yourself.) It's probably not a huge deal, but I didn't see you address it. |
For parsing, the first step could be to extend the SAX interface with a A user who wants to use binary data would then need to implement the SAX interface including the For writing binary data, the actual JSON value would need to be extended. It would never be filled by the standard JSON parser, and we may decide to throw if binary data is tried to be But extending the JSON value is a lot of work, because it is used everywhere. It's would not be impossible, just a long way... What do you think? |
I was in the middle of extending the JSON value itself to have a binary type. I'd say I'm about 40-50% complete or so. I'd prefer to do it this way rather than use an intermediate representation, because typically an application would want to parse binary data into a format external to JSON. (My use case in particular are neural network weights, for instance.) |
Cool! How does the extension look like? Do you use a templated type? |
Yeah, default type is |
Throwing when serializing binary to json seems like a painful choice; it means losing the ability to easily switch formats, if you want to write binary data in formats that support it. As an alternate suggestion, how about picking a serialization format (e.g. base64) for binary to json conversion, doing nothing special on load (so it doesn't roundtrip), but making it easy to parse base64 to binary in user-defined serializers? More specifically,
So far I think this is the same as agreed above and @OmnipotentEntity is implementing. Then:
So |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This would be a nice an useful feature. This library can be used for creating tests towards a c target that parses CBOR and the library allows for writing easy to understand code when creating the data that the target should handle. For test the following changes have been made:
The problem with this is the cases when sending an actual string that starts with "0x" and has a valid hex representation following. For the above mentioned use case I am fine with this corner case, but I understand that this might not be acceptable for the library in general. Interesting is how the CBOR playground (http://cbor.me/) handles/presents this:
Can this issue be reopened? |
Check out the WAMP protocol for ideas on JSON-encoding a binary blob: https://wamp-proto.org/_static/gen/wamp_latest.html#binary-conversion-of-json-strings The basic idea is to prepend a Base64-encoded string with a nul character. |
For those still following this issue, #1662 addresses this issue, and is finally in a complete state, and is ready for review and merge. |
The functionality was added by merging #1662. |
Currently, the CBOR feature of the library only supports writing strings out as UTF-8 and doesn't support byte arrays as described in the CBOR standard. I'd like to implement COSE on top of this library, but in order to do that, byte string support is needed.
I'm happy to submit a pull request for this, but I wanted to check in advance if this is something you'd be open to
The text was updated successfully, but these errors were encountered: