Are Combining characters handled correctly? #2

thomasballinger · 2014-10-27T17:55:28Z

I'd also be excited about this functionality - is anyone else already working on it? I'm going to start work on a first pass implementation.

jquast · 2014-10-27T21:59:38Z

Great. I just reviewed unicode specs for it a few weeks ago, and it seems possible given the unicode reference data tables.

jquast · 2014-10-27T22:09:14Z

If you want to make a PR just for the ./setup.py update results that'd be great.

I'll reference some of the unicode specs when I get a moment, I'm travelling for the week -- but from memory I recall there are several distinct groupings about where the combining character modifies, such as "above-right", "below-center", etc. For all of these groups, we can simply imply that this does not add to the width, only modifies the previous character. for wcwidth(), it should return -1 as it already does, but for wcswidth, it should account for it. Pretty much as your test commit already demonstrates.

The only place it gets difficult is one of the hindi-like dialects that modifies more than one cell on a terminal, but this is a wild edge-case and we can open a seperate bug for that and let it hang until somebody gets interested in it.

For testing, the script bin/wcwidth-browser.py can be modified to also programmatically generate combined character sets. Its a little thick, but the class WcWideCharacterGenerator would generate a letter 'o'(oh) + ([1-cell combining characters]) for a final width of 1, and can be viewed in a terminal emulator to ensure the '|'(pipe)s still align. I can certainly help with that part if you have difficulty, wcwidth-browser.py isn't exactly easy to maintain.

jquast · 2014-10-28T20:58:58Z

tried to discover the technical docs, here are a few:

http://www.unicode.org/versions/Unicode7.0.0/ch03.pdf

section 3.6, subsection D52, "Combining character" page 105 (pdf pg. 34)
section 3.6, section "Application of Combining Marks" page 109 (pdf pg. 38)

http://www.unicode.org/faq/char_combmark.html

Q: How are characters counted when measuring the length or position of a character in a string?

Theres another document for font designers somewhere, that talks about how one would write a rendering engine for combining characters that I found the most helpful but I can't seem to find it at the moment. Anyway you can see there are a lot of considerations -- Hopefully we can ignore or omit most of them!

thomasballinger · 2014-10-28T23:02:27Z

Thanks for the links! I'll take a look on the subway tonight.

On Tue, Oct 28, 2014 at 4:58 PM, Jeff Quast notifications@github.com
wrote:

tried to discover the technical docs, here are a few:

http://www.unicode.org/versions/Unicode7.0.0/ch03.pdf

section 3.6, subsection D52, "Combining character" page 105 (pdf pg.
34)

section 3.6, section "Application of Combining Marks" page 109 (pdf
pg. 38)

http://www.unicode.org/faq/char_combmark.html

Q: How are characters counted when measuring the length or position
of a character in a string?

Theres another document for font designers somewhere, that talks about how
one would write a rendering engine for combining characters that I found
the most helpful but I can't seem to find it at the moment. Anyway you can
see there are a lot of considerations -- Hopefully we can ignore or omit
most of them!

—
Reply to this email directly or view it on GitHub
#2 (comment).

jquast · 2014-11-20T09:47:55Z

Leaving this open until I'm absolutely confident all combining characters are handled correctly.

jquast · 2015-08-27T17:13:59Z

Issue #10 believes this implementation to be incorrect.

jquast · 2015-09-14T06:28:29Z

Closed by PR #11 and release 0.1.5 available on pypi.

jquast added a commit that referenced this issue Oct 29, 2014

try #2 on href fix

01b4b45

jquast added the bug label Nov 20, 2014

jquast added question and removed bug labels Dec 10, 2014

jquast changed the title ~~Combining characters~~ Are Combining characters handled correctly? Dec 10, 2014

jquast added needs-feedback and removed question labels Mar 11, 2015

jquast mentioned this issue Aug 27, 2015

Using DerivedCombiningClass.txt to determine width is inappropriate #10

Closed

jquast closed this as completed Sep 14, 2015

jquast mentioned this issue Dec 14, 2023

Drop UNICODE_VERSION ? #104

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are Combining characters handled correctly? #2

Are Combining characters handled correctly? #2

thomasballinger commented Oct 27, 2014

jquast commented Oct 27, 2014

jquast commented Oct 27, 2014

jquast commented Oct 28, 2014

thomasballinger commented Oct 28, 2014

jquast commented Nov 20, 2014

jquast commented Aug 27, 2015

jquast commented Sep 14, 2015

Are Combining characters handled correctly? #2

Are Combining characters handled correctly? #2

Comments

thomasballinger commented Oct 27, 2014

jquast commented Oct 27, 2014

jquast commented Oct 27, 2014

jquast commented Oct 28, 2014

thomasballinger commented Oct 28, 2014

jquast commented Nov 20, 2014

jquast commented Aug 27, 2015

jquast commented Sep 14, 2015