-
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
char8_t and std::u8string support #1914
Comments
You are right, |
So, just out of curiosity i replaced most of the char occurences with char8_t, and set StringType to std::u8string in the single json.hpp header. It builds successfully but produces garbage at runtime. Here is the commit for reference, but please do not try this at home. Iam very sure that I introduced undefined Behaviour, which is no wonder with such a straigh-forward type replacement (maybe i also replaced way to much, also iam not very familiar with the codebase). I guess it fails somewhere in the dump(..) dump_escaped() decode() area because calls to output_adapter_t<char8_t>::write_characters(...) sometimes overrwrite the beginning of output_adapter's underlying string, even though the function calls an StringType::append or StringType::push_back. So it is somehow possible to build with char8_t enabled and with a lot of tweaking and tinkering the internal functionality could be kept as it is without any UB. StringType::CharT template parameters would be needed so that it is possible to switch between char and char8_t. Also some detection mechanism, if char8_t is availible, using compiler dependend defines could be usefull. Some mechanism to allow backwards compability is needed. Therfore this paper exists. There are some recommandations on how to be compatible in both directions. for example a user defined literal "U8(...)" that can switch between char and char8_t would be possible. But iam not quite sure if that would break something for the end users. those decisions have to be taken very carefully. I think a lot of the unit test would have to be rewritten. For example a And i guess there is way more to think about. I just wanted to write down my thoughs, maybe they help some one else to develop a strategy. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Hello, Line 540 in 55281e0
I think this issue should be renamed to The problem will still be there and will continue to arise as the release date of C++20 was some time ago and the use of its functionalities stabilizes. As JSON files is being by specification encoded with UTF-8, and it is illegal to include BOM bytes at the beginning of files, so I would say that On the other hand think there is a C++ specification hell regarding distinguishing traits of a written human language in digital:
Where https://en.wikipedia.org/wiki/UTF-8#Invalid_sequences_and_error_handling So I think As I can read in FAQ: json/doc/mkdocs/docs/home/faq.md Line 66 in 55281e0
I think that after passing this step, every key/value JSON string should/could be represented by std::u8string and char8_t strings.
What is being tested by json/test/src/unit-unicode1.cpp Line 104 in f42a74b
But has some sections turned off: json/test/src/unit-unicode1.cpp Line 140 in f42a74b
I am not enough into Unicode and C++ strict level specs, so maybe we can ask @tahonermann :P |
Hello,
when porting our codebase to std=c++2a the compability with nlohmann::json will break since we use u8"" string literals for assigning strings to json objects.
I tried changing ObjectType's StringType to std::u8string, but that is not working because there are some types hardcoded to char in the json.hpp, for example serializer's
output_adapter_t<char>
(line 14587) and others.So my questions are:
Is there any native support for char8_t planned in the future?
Are there any known workarounds to add char8_t support by hand right now?
Greetings
The text was updated successfully, but these errors were encountered: