CJK characters are not treated as double width chars #62

Tyriar · 2016-06-02T08:08:07Z

vs gnome-terminal:

Notice the characters are exactly 2 ascii characters in width

parisk · 2016-06-03T05:55:55Z

Thanks for reporting. Will take a look at this along with the rest international character issues.

parisk · 2016-06-07T09:32:36Z

Seems like the weird line break is actually the div overflowing. This indeed happens because these are double byte characters, the terminal ignores this fact and counts them as single-byte ones.

Working on a fix.

Screenshot

jerch · 2016-06-13T14:25:11Z

@parisk To support this you gonna have to include the wcwidth calculation for any Unicode codepoint (see man wcwidth) and adjust the taken space for fullwidth characters to 2 terminal cells (terminals are based on the idea that a single cell can hold one halfwidth character since all western characters are halfwidth).
Problem with Javascript and Unicode is - it is a total mess with the UTF-16 encoding (surrogates) and very expensive to calculate. This gets even more complicated if you plan to support stackable combining characters (wcwidth will report a width of zero for those). This might be easier to implement once the Unicode functions with real codepoints are widely adapted by the JS engines.
I tried to implement this according to the Unicode spec but well - it ended up as a total code mess in my own terminal emulator.
Original source of wcwidth: https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c

jerch · 2016-06-16T09:23:33Z

I had a look at your line at cell abstraction - this is pretty much the same I did in my emulator. If you like I can extend your isWide and write functions with the wcwidth stuff in a PR.
Another problem is the rendering with Unicode. For higher BMP or non BMP codepoints the fonts might differ in the glyph width from wcwidth (if they have that glyph at all). You are likely to need a CSS class fixing misaligned glyphs into place to keep the output a well formed cell grid.

parisk · 2016-06-16T10:15:24Z

Interesting. Do you believe that this could be handy? http://code.woong.org/wcwidth.js

jerch · 2016-06-16T11:03:09Z

yes, it should do the trick, though it is not 100% xterm compatible (xterm uses a slightly different version of it). If xterm compatibility is a major concern for xterm.js, we'd have to strip the lookup tables from their sources.

Tyriar · 2016-06-16T17:13:31Z

You can look at a solution to this here chjj/term.js#97

jerch · 2016-06-17T08:56:20Z

That seems to work for BMP fullwidth characters. Surrogates are still failing in width and cursor positioning as you can test with these characters: 𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡
No clue if any character set of the other planes is important enough to implement the dirty surrogate handling (see Polyfill here https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/codePointAt). At least apple uses some codepoints of the private planes for their emojies.
Also combining characters are not handled by this as you can test with 'cafe\u0301'. This would be a small fix to the existing code, since the combining char has no width and would simply end up in the last active terminal cell --> ['c', 'a', 'f', 'e\u0301']. It is still a mess at the last cell in row though.

parisk · 2016-06-27T09:03:12Z

Closed in #144.

parisk added the type/bug Something is misbehaving label Jun 2, 2016

This was referenced Jun 4, 2016

Pasting in 日本語 adds an unexpected line break after 本 which puts the cursor off screen #61

Closed

Investigate term.js alternatives microsoft/vscode#6838

Closed

This was referenced Jun 9, 2016

Line breaks in terminal window when changed editor-fontsize in settings.json . microsoft/vscode#7448

Closed

Integrated Terminal CJK characters display error microsoft/vscode#7530

Closed

This was referenced Jun 15, 2016

vim in terminal can not display first line microsoft/vscode#7695

Closed

Powerline fonts in terminal incorrect microsoft/vscode#7116

Closed

This was referenced Jun 20, 2016

Strange characters show up when moving a cursor over hindi #72

Closed

wcwidth calculation #144

Merged

parisk closed this as completed Jun 27, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CJK characters are not treated as double width chars #62

CJK characters are not treated as double width chars #62

Tyriar commented Jun 2, 2016

parisk commented Jun 3, 2016

parisk commented Jun 7, 2016

jerch commented Jun 13, 2016 •

edited

Loading

jerch commented Jun 16, 2016

parisk commented Jun 16, 2016

jerch commented Jun 16, 2016

Tyriar commented Jun 16, 2016

jerch commented Jun 17, 2016 •

edited

Loading

parisk commented Jun 27, 2016

CJK characters are not treated as double width chars #62

CJK characters are not treated as double width chars #62

Comments

Tyriar commented Jun 2, 2016

parisk commented Jun 3, 2016

parisk commented Jun 7, 2016

Screenshot

jerch commented Jun 13, 2016 • edited Loading

jerch commented Jun 16, 2016

parisk commented Jun 16, 2016

jerch commented Jun 16, 2016

Tyriar commented Jun 16, 2016

jerch commented Jun 17, 2016 • edited Loading

parisk commented Jun 27, 2016

jerch commented Jun 13, 2016 •

edited

Loading

jerch commented Jun 17, 2016 •

edited

Loading