-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zerowidth reported for some width=1 characters #167
Comments
Good catch. Right now we assign zero width to everything in category Sk (Symbol, modifier), which doesn't seem right. But it looks like some of the characters in that category are zero width (e.g. U+1F3FB), so it's unclear to me what rule we should use in general. |
It just happened to be in my one-line test file.
Oddly, Unicode declares that character to be Wide (in the
and that's how my Kubuntu Konsole terminal displays them (double-width). So they aren't zero-width? |
Hmm, according to http://unicode.org/reports/tr51/, whether the emoji skin-tone characters are zero-width combining characters depends on the font. So I guess in a fixed-width font we should assume they are wide, not combining? If this is true of everything in Sk, that makes life easy — we just remove them from the set of zero-width combining characters here. |
This is for version 2.3 and 2.4.
U+00A8 (diaresis) and U+00B4 (acute accent) now both return 0 in a
utf8proc_charwidth()
call.At some point in the past they returned 1 (according to an old log output I have).
The result should be 1. These are not combining diacriticals, but are "spacing characters", (just as U+00af - macron and U+00b0 - degree sign are). They are what you use if you want a "bare" diaresis or acute.
(The combining diacritics are U+0308 and U+0301 respectively).
The text was updated successfully, but these errors were encountered: