Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify that char16_t and char32_t literals are UTF-16 and UTF-32 respectively #6

Closed
tahonermann opened this issue Apr 23, 2018 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@tahonermann
Copy link
Member

The C and C++ standards do not currently specify that the encoding of char16_t and char32_t literals are respectively UTF-16 and UTF-32. C states that they are only if the corresponding STDC_UTF_16 or STDC_UTF_32 macro is defined to 1 (6.10.8.2, "Environment macros"). Various parts of the C++ standard (codecvt and char_traits) refer to UTF-16/UTF-32 thereby admitting a bias towards these encodings despite lack of strict specification.

It may be that, in practice, all C and C++ compilers that are being updated to conform to new standards, are only using UTF-16 and UTF-32 for these literals. If so, the standards can be updated to mandate the use of these encodings.

@tahonermann
Copy link
Member Author

Status of this issue: P1041R1 appeared in the post-Rapperswil mailing. With a little luck, it will be presented to EWG in San Diego.

@tahonermann tahonermann added the paper submitted A paper proposing a specific solution has been submitted label Aug 6, 2018
@ThePhD
Copy link
Collaborator

ThePhD commented Nov 14, 2018

Luck achieved: this was presented in San Diego with a presentation from @tahonermann.

It is on track for C++20. We will need to bring it through EWG, but no resistance to this is anticipated.

@ThePhD
Copy link
Collaborator

ThePhD commented Nov 14, 2018

Side note: it was decided that for the __STD_C... macros related to UTF16/32, those could be handled as editorial / Defect Reports in Core, and did not have to be necessarily apart of this proposal.

@tahonermann
Copy link
Member Author

@martinho, we should get an updated revision of this paper in the pre-Kona mailing (after November 26th, but before January 21st). Some things to update:

  • Re-base on the latest WD (No need to mention the non-conflict with P0482 any more).
  • Wording updates are needed to replace occurrences of char16_t character literal, char16_t string literal, char32_t character literal, and char32_t string literal with the new terms throughout the WD.
  • Perhaps mention the __STDC_UTF_16__ and __STDC_UTF_32__ C macros as indication of why this paper is evolutionary and not just a core issue. The macros indicate the intent that, originally, the encoding was intended to be implementation defined.

With regard to those macros, we actually had references to them in C++14 (in the <cuchar> synopsis), but lost them along the way to C++17 (we now just defer to C for the contents of the <cuchar> header). I don't think we need to add them back unless we're going to state that implementations must define them. That concern can be handled as a core issue after getting approval as JeanHeyd already mentioned, but we might be able to short cut that core issue processing by adding the macro requirements to the paper. I have a slight preference toward doing the latter.

For reference: gcc and clang both define __STDC_UTF_16__ and __STDC_UTF_32__ in both C and C++ compilation modes. MSVC never defines them. I asked Jonathan Caves about it and he stated it was probably just an oversight.

@tahonermann
Copy link
Member Author

Oh, hey, there are a few relevant core issues:

  • CWG 1859 - UTF-16 in char16_t string literals
  • CWG 1802 - char16_t string literals and surrogate pairs

@tahonermann
Copy link
Member Author

This issue was resolved by the adoption of P1041R4 in Kona. Closing.

@tahonermann tahonermann removed the paper submitted A paper proposing a specific solution has been submitted label Nov 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

3 participants