You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unicode paper L2/22-072R: Proposal for amendments to UAX#9 and UAX#31, adopted for the upcoming Unicode 15 release, demonstrates the utility in allowing U+200E LEFT-TO-RIGHT MARK (LRM) and U+200F RIGHT-TO-LEFT MARK (RLM) to appear in whitespace, but not to constitute whitespace in isolation. The intent is to allow these marks to be inserted in whitespace in order to restore character directionality that might have been altered by characters in the preceding token.
The text was updated successfully, but these errors were encountered:
tahonermann
changed the title
Allow LEFT-TO-RIGHT MARK and RIGHT-TO-LEFT MARK in whitespace
Extend whitespace to include NEL, LS, PS, LRM, RLM, and maybe ALM.
May 28, 2022
I updated the issue title to extend this issue to cover the inclusion of all of the following characters in whitespace. This would suffice for C++ to meet the Pattern_White_Space requirements of UAX31-R3.
U+0085 NEXT LINE (NEL)
U+200E LEFT-TO-RIGHT MARK (LRM)
U+200F RIGHT-TO-LEFT MARK (RLM)
U+2028 LINE SEPARATOR (LS)
U+2029 PARAGRAPH SEPARATOR (PS)
Additionally, inclusion of the ALM should be considered as it is conceptually similar to LRM and RLM, though it is not a member of the Pattern_White_Space property (and cannot be added because that property is immutable). Including this character in whitespace would require the specification of a profile in [uaxid.pattern] for conformance with UAX31-R3.
U+061C ARABIC LETTER MARK (ALM)
tahonermann
changed the title
Extend whitespace to include NEL, LS, PS, LRM, RLM, and maybe ALM.
Extend whitespace to include NEL, LS, PS, LRM, RLM, and maybe ALM
May 28, 2022
Unicode paper L2/22-072R: Proposal for amendments to UAX#9 and UAX#31, adopted for the upcoming Unicode 15 release, demonstrates the utility in allowing U+200E LEFT-TO-RIGHT MARK (LRM) and U+200F RIGHT-TO-LEFT MARK (RLM) to appear in whitespace, but not to constitute whitespace in isolation. The intent is to allow these marks to be inserted in whitespace in order to restore character directionality that might have been altered by characters in the preceding token.
The text was updated successfully, but these errors were encountered: