-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are Combining characters handled correctly? #2
Comments
Great. I just reviewed unicode specs for it a few weeks ago, and it seems possible given the unicode reference data tables. |
If you want to make a PR just for the ./setup.py update results that'd be great. I'll reference some of the unicode specs when I get a moment, I'm travelling for the week -- but from memory I recall there are several distinct groupings about where the combining character modifies, such as "above-right", "below-center", etc. For all of these groups, we can simply imply that this does not add to the width, only modifies the previous character. for wcwidth(), it should return -1 as it already does, but for wcswidth, it should account for it. Pretty much as your test commit already demonstrates. The only place it gets difficult is one of the hindi-like dialects that modifies more than one cell on a terminal, but this is a wild edge-case and we can open a seperate bug for that and let it hang until somebody gets interested in it. For testing, the script bin/wcwidth-browser.py can be modified to also programmatically generate combined character sets. Its a little thick, but the class WcWideCharacterGenerator would generate a letter 'o'(oh) + ([1-cell combining characters]) for a final width of 1, and can be viewed in a terminal emulator to ensure the '|'(pipe)s still align. I can certainly help with that part if you have difficulty, wcwidth-browser.py isn't exactly easy to maintain. |
tried to discover the technical docs, here are a few: http://www.unicode.org/versions/Unicode7.0.0/ch03.pdf
http://www.unicode.org/faq/char_combmark.html
Theres another document for font designers somewhere, that talks about how one would write a rendering engine for combining characters that I found the most helpful but I can't seem to find it at the moment. Anyway you can see there are a lot of considerations -- Hopefully we can ignore or omit most of them! |
Thanks for the links! I'll take a look on the subway tonight. On Tue, Oct 28, 2014 at 4:58 PM, Jeff Quast notifications@github.com
|
Leaving this open until I'm absolutely confident all combining characters are handled correctly. |
Issue #10 believes this implementation to be incorrect. |
Closed by PR #11 and release 0.1.5 available on pypi. |
I'd also be excited about this functionality - is anyone else already working on it? I'm going to start work on a first pass implementation.
The text was updated successfully, but these errors were encountered: