Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Support for non-UTF-8 values? #1162

Closed
DUOLabs333 opened this issue Jul 17, 2024 · 3 comments
Closed

[Question] Support for non-UTF-8 values? #1162

DUOLabs333 opened this issue Jul 17, 2024 · 3 comments
Labels
question Further information is requested

Comments

@DUOLabs333
Copy link

If I have a string made up of raw chars, with no effort made to escape them, will glaze be able to serialize/parse them?

@stephenberry
Copy link
Owner

If you're writing your strings in C/C++ or most any text editor then you probably have valid UTF-8, unless you are adding invisible control characters.

JSON strictly requires UTF-8, so Glaze will reject illegal strings, such as strings that contain null or control characters in the middle of them. These can be written as escaped unicode \u, but this is typically dangerous and prone to error in C++ and other languages because it can result in hidden null characters in types like std::string and will break a lot of C string algorithms like strnlen

We are planning to add a compile time option to automatically unicode escape invalid UTF-8. The open issue is here #812. But, this is not recommended for general use.

What is your use case for non UTF-8 strings? Are you expecting invisible control characters in your strings?

In summary, Glaze does not unicode escape invalid UTF-8 when writing to ensure performance, but Glaze does ensure that the strings written will trigger a read error by any conforming JSON parser. If any JSON library is able to parse what you are writing, then you know that you're good to go.

@stephenberry stephenberry added the question Further information is requested label Jul 17, 2024
@DUOLabs333
Copy link
Author

I'm writing a Vulkan driver in C++, and some commands/structs allow using a void pointer to hold arbitrary data. Since I'm sending the data over a network, I need to be able to serialize it.

However, now that I think about it, I probably should use std::vector<uint8_t> for those fields over std::string, right?

@stephenberry
Copy link
Owner

Absolutely, arbitrary data like this is best in a std::vector<uint8_t> or std::vector<std::byte>.

I'll note that the same goes for if you use the binary format BEVE with Glaze.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants