Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zerowidth reported for some width=1 characters #167

Closed
original-birdman opened this issue Dec 23, 2019 · 3 comments
Closed

Zerowidth reported for some width=1 characters #167

original-birdman opened this issue Dec 23, 2019 · 3 comments

Comments

@original-birdman
Copy link

original-birdman commented Dec 23, 2019

This is for version 2.3 and 2.4.

U+00A8 (diaresis) and U+00B4 (acute accent) now both return 0 in a utf8proc_charwidth() call.
At some point in the past they returned 1 (according to an old log output I have).

The result should be 1. These are not combining diacriticals, but are "spacing characters", (just as U+00af - macron and U+00b0 - degree sign are). They are what you use if you want a "bare" diaresis or acute.

(The combining diacritics are U+0308 and U+0301 respectively).

@stevengj
Copy link
Member

Good catch. Right now we assign zero width to everything in category Sk (Symbol, modifier), which doesn't seem right. But it looks like some of the characters in that category are zero width (e.g. U+1F3FB), so it's unclear to me what rule we should use in general.

@original-birdman
Copy link
Author

original-birdman commented Jan 14, 2020

Good catch.

It just happened to be in my one-line test file.

But it looks like some of the characters in that category are zero width (e.g. U+1F3FB), so it's unclear to me what rule we should use in general.

Oddly, Unicode declares that character to be Wide (in the EastAsianWidth.txt defs)!

1F3FB..1F3FF;W   # Sk     [5] EMOJI MODIFIER FITZPATRICK TYPE-1-2..EMOJI MODIFIER FITZPATRICK TYPE-6

and that's how my Kubuntu Konsole terminal displays them (double-width).
And since my editor (which is what is using libutf8proc) works OK with them, the library must be currently reporting them as double-width.

So they aren't zero-width?

@stevengj
Copy link
Member

stevengj commented Jan 17, 2020

Hmm, according to http://unicode.org/reports/tr51/, whether the emoji skin-tone characters are zero-width combining characters depends on the font. So I guess in a fixed-width font we should assume they are wide, not combining?

If this is true of everything in Sk, that makes life easy — we just remove them from the set of zero-width combining characters here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants