Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are Combining characters handled correctly? #2

Closed
thomasballinger opened this issue Oct 27, 2014 · 7 comments
Closed

Are Combining characters handled correctly? #2

thomasballinger opened this issue Oct 27, 2014 · 7 comments

Comments

@thomasballinger
Copy link
Contributor

I'd also be excited about this functionality - is anyone else already working on it? I'm going to start work on a first pass implementation.

@jquast
Copy link
Owner

jquast commented Oct 27, 2014

Great. I just reviewed unicode specs for it a few weeks ago, and it seems possible given the unicode reference data tables.

@jquast
Copy link
Owner

jquast commented Oct 27, 2014

If you want to make a PR just for the ./setup.py update results that'd be great.

I'll reference some of the unicode specs when I get a moment, I'm travelling for the week -- but from memory I recall there are several distinct groupings about where the combining character modifies, such as "above-right", "below-center", etc. For all of these groups, we can simply imply that this does not add to the width, only modifies the previous character. for wcwidth(), it should return -1 as it already does, but for wcswidth, it should account for it. Pretty much as your test commit already demonstrates.

The only place it gets difficult is one of the hindi-like dialects that modifies more than one cell on a terminal, but this is a wild edge-case and we can open a seperate bug for that and let it hang until somebody gets interested in it.

For testing, the script bin/wcwidth-browser.py can be modified to also programmatically generate combined character sets. Its a little thick, but the class WcWideCharacterGenerator would generate a letter 'o'(oh) + ([1-cell combining characters]) for a final width of 1, and can be viewed in a terminal emulator to ensure the '|'(pipe)s still align. I can certainly help with that part if you have difficulty, wcwidth-browser.py isn't exactly easy to maintain.

@jquast
Copy link
Owner

jquast commented Oct 28, 2014

tried to discover the technical docs, here are a few:

http://www.unicode.org/versions/Unicode7.0.0/ch03.pdf

  • section 3.6, subsection D52, "Combining character" page 105 (pdf pg. 34)
  • section 3.6, section "Application of Combining Marks" page 109 (pdf pg. 38)

http://www.unicode.org/faq/char_combmark.html

  • Q: How are characters counted when measuring the length or position of a character in a string?

Theres another document for font designers somewhere, that talks about how one would write a rendering engine for combining characters that I found the most helpful but I can't seem to find it at the moment. Anyway you can see there are a lot of considerations -- Hopefully we can ignore or omit most of them!

@thomasballinger
Copy link
Contributor Author

Thanks for the links! I'll take a look on the subway tonight.

On Tue, Oct 28, 2014 at 4:58 PM, Jeff Quast notifications@github.com
wrote:

tried to discover the technical docs, here are a few:

http://www.unicode.org/versions/Unicode7.0.0/ch03.pdf

  • section 3.6, subsection D52, "Combining character" page 105 (pdf pg.
    34)
  • section 3.6, section "Application of Combining Marks" page 109 (pdf
    pg. 38)

http://www.unicode.org/faq/char_combmark.html

  • Q: How are characters counted when measuring the length or position
    of a character in a string?

Theres another document for font designers somewhere, that talks about how
one would write a rendering engine for combining characters that I found
the most helpful but I can't seem to find it at the moment. Anyway you can
see there are a lot of considerations -- Hopefully we can ignore or omit
most of them!


Reply to this email directly or view it on GitHub
#2 (comment).

jquast added a commit that referenced this issue Oct 29, 2014
@jquast
Copy link
Owner

jquast commented Nov 20, 2014

Leaving this open until I'm absolutely confident all combining characters are handled correctly.

@jquast jquast added the bug label Nov 20, 2014
@jquast jquast added question and removed bug labels Dec 10, 2014
@jquast jquast changed the title Combining characters Are Combining characters handled correctly? Dec 10, 2014
@jquast
Copy link
Owner

jquast commented Aug 27, 2015

Issue #10 believes this implementation to be incorrect.

@jquast
Copy link
Owner

jquast commented Sep 14, 2015

Closed by PR #11 and release 0.1.5 available on pypi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants