Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly disallow unnamed Unicode codepoints in http://eel.is/c++draft/lex.charset#2 #8

Closed
tahonermann opened this issue Apr 23, 2018 · 9 comments
Assignees
Labels
clarification Something isn't clear

Comments

@tahonermann
Copy link
Member

http://eel.is/c++draft/lex.charset#2
Says:

The character designated by the universal-character-name \UNNNNNNNN is that character whose character short name in ISO/IEC 10646 is NNNNNNNN; the character designated by the universal-character-name \uNNNN is that character whose character short name in ISO/IEC 10646 is 0000NNNN. If the hexadecimal value for a universal-character-name corresponds to a surrogate code point (in the range 0xD800–0xDFFF, inclusive), the program is ill-formed.

Should it also say:

If the hexadecimal value for a universal-character-name is not named by ISO/IEC 10646 then the program is ill-formed

Or is that implied?

@tahonermann tahonermann added the clarification Something isn't clear label Apr 23, 2018
@rmartinho
Copy link
Collaborator

rmartinho commented May 7, 2018

After some review, I have the feeling this stems from an outdated 10646 version. The current version doesn't have a "short name" concept (well, except for jamo short names, but that's clearly not the intention here). There's a "short identifier" concept (in 6.5) but that cannot be more than six digits (the familiar U+ syntax). I propose the following.

The character designated by the universal-character-name \U00NNNNNN is that character whose code point short identifier in ISO/IEC 10646 is U+NNNNNN; the character designated by the universal-character-name \uNNNN is that character whose code point short identifier in ISO/IEC 10646 is U+NNNN. If the hexadecimal value for a universal-character-name corresponds to a surrogate code point (in the range 0xD800–0xDFFF, inclusive), the program is ill-formed. If a universal-character-name has any other hexadecimal value, the program is ill-formed.

(Ab)use of the C-word notwithstanding.

@steve-downey
Copy link
Collaborator

Also, IIUC, all short identifiers are valid even if they refer to a reserved codepoint. So you may be indicating something meaningless, but well defined meaningless.

@rmartinho
Copy link
Collaborator

rmartinho commented Jun 18, 2018

At Rapperswil I discussed this issue with people from Core and it was agreed that the current wording with "short name" will be fixed editorially. We didn't discuss whether we should explicitly make those without such a short identifier illformed.

@tahonermann
Copy link
Member Author

That's good. I think the right process is to submit a PR to https://github.com/cplusplus/draft. Anyone want to volunteer to do so?

@rmartinho
Copy link
Collaborator

I can do that.

@tahonermann tahonermann assigned rmartinho and unassigned mzeren-vmw Jun 18, 2018
@rmartinho
Copy link
Collaborator

rmartinho commented Jun 28, 2018

This PR cplusplus/draft#2201 will be merged fixing the "short name" issue editorially. As discussed with Jens, I will write a short paper to clean up the wording there and clarify the nonexistent code point case.

rmartinho added a commit to rmartinho/sg16 that referenced this issue Jun 28, 2018
rmartinho added a commit to rmartinho/sg16 that referenced this issue Jun 28, 2018
rmartinho added a commit to rmartinho/sg16 that referenced this issue Jun 28, 2018
@tahonermann
Copy link
Member Author

@rmartinho I think this issue is complete, yes?

@tahonermann
Copy link
Member Author

@rmartinho I think this issue is complete, yes?

Ah, no, this isn't complete yet. The accepted editorial PR only addressed the "short name" vs "short identifier" terminology issue. Martinho's draft D1139 (which I think is yet to be submitted to a mailing) addresses the concern tracked by this issue.

@tahonermann tahonermann added the paper needed A paper proposing a specific solution is needed label Aug 6, 2018
@tahonermann
Copy link
Member Author

Closing this issue as resolved following the adoption of P1139R2 in Kona.

@tahonermann tahonermann removed the paper needed A paper proposing a specific solution is needed label Jun 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clarification Something isn't clear
Development

No branches or pull requests

4 participants