Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Finish full unicode support (M:N cell rendering, ZWJ?) #1472

Closed
Tracked by #190
giosal opened this issue Jun 23, 2019 · 10 comments
Closed
Tracked by #190

Feature Request: Finish full unicode support (M:N cell rendering, ZWJ?) #1472

giosal opened this issue Jun 23, 2019 · 10 comments
Assignees
Labels
Area-Rendering Text rendering, emoji, complex glyph & font-fallback issues Issue-Feature Complex enough to require an in depth planning process and actual budgeted, scheduled work. Needs-Tag-Fix Doesn't match tag requirements Product-Conhost For issues in the Console codebase Product-Terminal The new Windows Terminal.
Milestone

Comments

@giosal
Copy link

giosal commented Jun 23, 2019

Summary of the new feature/enhancement

When the Terminal came out, there was a mention of Unicode support, but I can see that it's still not there. There is no support for Georgian script yet.

@giosal giosal added the Issue-Feature Complex enough to require an in depth planning process and actual budgeted, scheduled work. label Jun 23, 2019
@ghost ghost added Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Needs-Tag-Fix Doesn't match tag requirements labels Jun 23, 2019
@DHowett-MSFT DHowett-MSFT changed the title Feature Request - add full Unicode support, still no Georgian scripts Feature Request: Finish full unicode support (M:N cell rendering, ZWJ?) Jun 27, 2019
@DHowett-MSFT DHowett-MSFT added Area-Rendering Text rendering, emoji, complex glyph & font-fallback issues Product-Conhost For issues in the Console codebase Product-Terminal The new Windows Terminal. labels Jun 27, 2019
@ghost ghost removed the Needs-Tag-Fix Doesn't match tag requirements label Jun 27, 2019
@DHowett-MSFT
Copy link
Contributor

This is now the master issue for all good good rendering efforts.

@DHowett-MSFT DHowett-MSFT removed the Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting label Jun 27, 2019
@giosal
Copy link
Author

giosal commented Jun 27, 2019

I apologize for not providing this detail previously, but I have just checked all three modes available on my laptop - PowerShell, CMD and Ubuntu WSL1 - it's not available in any of them.

@miniksa
Copy link
Member

miniksa commented Jun 27, 2019

It's alright. We know that we're not done in this space and when we sat in triage, we couldn't believe we didn't file the issue yet. Congrats, yours is now the master tracking that we still have work to do to live up to our Unicode promise.

@zadjii-msft zadjii-msft added this to the Terminal v1.0 milestone Jul 2, 2019
DHowett-MSFT referenced this issue May 8, 2020
The table that we refer to in `CodepointWidthDetector.cpp` to determine
whether or not a codepoint should be rendered as Wide vs Narrow was
based off EastAsianWidth[1].  If a codepoint wasn't included in this
table, they're considered Narrow. Many emojis aren't specified in the
EAW list, so this PR supplements our table with emoji codepoints from
emoji-data[2] in order to render most, if not all, emojis as full-width.

There are certain codepoints I've added to the comments (in case we want
to add them officially to the table in the future) that Microsoft
decided to give an emoji presentation even if it's specified as
Narrow/Ambiguous in the EAW list and are _not_ specified in the Unicode
emoji list. These include all of the Mahjong Tiles block, different
direction pencils (✎✐), different pointing index fingers (☜, ☞) among
others. I have no idea if I've captured all of them, as I don't know of
an easy way to detect which are Microsoft specific emojis.

## Validation Steps Performed
I have looked at so many emojis that I dream emoji.

These screenshots aren't encompassing _all_ emoji but I've tried to grab
a couple from all across the codepoint ranges:

Before:
![before](https://user-images.githubusercontent.com/57155886/81445092-2051a980-912d-11ea-9739-c9f588da407d.png)

After:
![after](https://user-images.githubusercontent.com/57155886/81445107-2778b780-912d-11ea-9615-676c2150e798.png)

[1] http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt
[2] https://www.unicode.org/Public/13.0.0/ucd/emoji/emoji-data.txt

Closes #900

(cherry picked from commit 7ae3433)
@francogp
Copy link

francogp commented Jun 20, 2020

maybe its related to this issue #6615

@ofek
Copy link
Contributor

ofek commented Mar 4, 2023

Has there been any progress on this?

@JustinGrote
Copy link

Trying to use unicode still results in weird spaces that don't occur in other terminals
image

@DHowett
Copy link
Member

DHowett commented Mar 24, 2023

still

That would be why this workitem is still open 😄

@JustinGrote
Copy link

JustinGrote commented Mar 24, 2023

@DHowett whoops, I didn't fully flesh out that info, I should have made it a sub-issue, sorry for the bump notify, it wasn't my intent :)

github-merge-queue bot pushed a commit that referenced this issue Jun 26, 2024
First, this adds `GraphemeTableGen` which
* parses `ucd.nounihan.grouped.xml`
* computes the cluster break property for each codepoint
* computes the East Asian Width property for each codepoint
* compresses everything into a 4-stage trie
* computes a LUT of cluster break rules between 2 codepoints
* and serializes everything to C++ tables and helper functions

Next, this adds `GraphemeTestTableGen` which
* parses `GraphemeBreakTest.txt`
* splits each test into graphemes and break opportunities
* and serializes everything to a C++ table for use as unit tests

`CodepointWidthDetector.cpp` was rewritten from scratch to
* use an iterator struct (`GraphemeState`) to maintain state
* accumulate codepoints until a break opportunity arises
* accumulate the total width of a grapheme
* support 3 different measurement modes: Grapheme clusters,
  `wcswidth`-style, and a mode identical to the old conhost

With this in place the following changes were made:
* `ROW::WriteHelper::_replaceTextUnicode` now uses the new
  grapheme cluster text iterators
* The same function was modified to join new text with existing
  contents of the current cell if they join to form a cluster
* Otherwise, a ton of places were modified to funnel the selection
  of the measurement mode over from WT's settings to ConPTY

This is part of #1472

## Validation Steps Performed
* So many tests ✅
* https://github.com/apparebit/demicode works fantastic ✅
* UTF8-torture-test.txt works fantastic ✅
@lhecker
Copy link
Member

lhecker commented Jul 22, 2024

Similar to #190, this issue can now also be closed. #16916 added support for ZWJ and thus this work is now complete. There are some smaller issues left to clean up, but I expect them to be a rare encounter. As such I'll close this issue for now.

This will ship in Windows Terminal 1.22 this year. If you want to try it out right now, please feel free to download our Canary (nightly) build here: #16121

Please note that PowerShell does not have support for complex Unicode yet, but that's expected to change in the foreseeable future (no exact date yet).

@lhecker lhecker closed this as completed Jul 22, 2024
@microsoft-github-policy-service microsoft-github-policy-service bot added the Needs-Tag-Fix Doesn't match tag requirements label Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Rendering Text rendering, emoji, complex glyph & font-fallback issues Issue-Feature Complex enough to require an in depth planning process and actual budgeted, scheduled work. Needs-Tag-Fix Doesn't match tag requirements Product-Conhost For issues in the Console codebase Product-Terminal The new Windows Terminal.
Projects
None yet
Development

No branches or pull requests

10 participants