Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some Unicode characters(U+2714) displayed incorrectly #12537

Closed
brainos233 opened this issue Feb 20, 2022 · 10 comments · Fixed by #14959
Closed

Some Unicode characters(U+2714) displayed incorrectly #12537

brainos233 opened this issue Feb 20, 2022 · 10 comments · Fixed by #14959
Labels
In-PR This issue has a related PR Issue-Bug It either shouldn't be doing this or needs an investigation. Needs-Tag-Fix Doesn't match tag requirements Priority-3 A description (P3) Product-Terminal The new Windows Terminal.
Milestone

Comments

@brainos233
Copy link

Windows Terminal version

1.12.10393.0

Windows build number

10.0.19044.0

Other Software

PowerShell 7.2.1
wsl.exe

Steps to reproduce

Using PowerShell 7.2.1 to print some Unicode characters like "✔" and the size is too small

Expected Behavior

In Fluent Terminal
image

Actual Behavior

In WT
image

@brainos233 brainos233 added the Issue-Bug It either shouldn't be doing this or needs an investigation. label Feb 20, 2022
@ghost ghost added Needs-Tag-Fix Doesn't match tag requirements Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting labels Feb 20, 2022
@elsaco
Copy link

elsaco commented Feb 21, 2022

The experimental text rendering engine does a better job:

wt_experimental

and w/out:

wt_unicode_test

@j4james
Copy link
Collaborator

j4james commented Feb 21, 2022

Note that U+2714 is defined in Unicode as a neutral character, which is meant to occupy a single column, while U+274C is defined as a wide character, and is meant to occupy two columns.

As a point of reference, here's what those characters look like in a bunch of different terminals. My test case is:
printf "+-+--+\n|\u2714|\u274c|\n+-+--+\n"

image

From left to right, top to bottom: XTerm, Gnome Terminal, MLTerm, WezTerm, Contour, St, Alacritty, and Kitty.

Gnome Terminal might be letting U+2714 overlap into the following column to a certain extent, but it's clearly still a single column character. Everyone else seems to be quiet strictly constrained to the one column. So in my opinion, the rendering in Fluent Terminal is incorrect, but it's possible that's deliberate.

That said, the Atlas renderer clearly does a better job of dealing with fonts like this than the DX renderer. However, I don't think we should ever expect it to take up two columns.

@lhecker
Copy link
Member

lhecker commented Feb 23, 2022

If you look closely you can see how the glyph is cut off on both sides in the experimental rendering engine ("AtlasEngine").
I'll likely submit a PR in the future which will make the AtlasEngine look similar to our current standard renderer by downsizing glyphs larger than number of cells allocated for them (1 cell in case of ✔️).

Not because I want to make the size of Emojis/symbols inconsistent like in the screenshot, but rather because some scripts only have proportional fonts, which require to be downsized to still be readable (most noticeable with Hebrew on a standard Windows setup). Maybe I can come up with something clever to find a good middle ground...

@zadjii-msft zadjii-msft added Product-Terminal The new Windows Terminal. Priority-3 A description (P3) Area-AtlasEngine and removed Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting labels May 16, 2022
@ghost ghost removed the Needs-Tag-Fix Doesn't match tag requirements label May 16, 2022
@zadjii-msft zadjii-msft added this to the 22H2 milestone May 16, 2022
@ghost ghost added the Needs-Tag-Fix Doesn't match tag requirements label May 16, 2022
@Jaykul
Copy link
Contributor

Jaykul commented Aug 5, 2022

I believe there's a bug here in how terminal is handling emoji. Many emoji also exist as characters:

Emoji Text Emoji
Check ✔︎ ✔️
Email 📧︎ 📧️
Penguin 🐧︎ 🐧️
Stopwatch ⏱︎ ⏱️
Heart ❤︎ ❤️

Anyway ... there's a lot.

The point is: Unicode has variation selectors, and they were made part of the emoji spec in 2018. Bottom line: fe0e selects text mode, and fe0f selects emoji mode and when the selector is missing, it's up to the software to choose which way to render it.

Here's the current list of all the variation sequences for 14.0.0.

I believe that Windows Terminal is doing the wrong thing here.

  • First: it ignores the value of the selector -- it always renders emoji.
  • Second: when there is no selector, it renders the emoji smaller. Frequently, so tiny you can't see what they are.

It appears to me that terminal is trying to render the emoji in a single character space whenever there's no variation selector. I think this should not be done -- but if you were going to do it, surely you should only do it when it's the "text" variation, and not when the variation is unspecified?!

I, for one, can live with always rendering emoji --as far as I know, no fixed-width font has characters for these anyway-- but making them small when the selector is missing is absolutely a bug.

For what it's worth, a concrete example. Here's what happens in the terminal:
image

When I copy/paste that from Terminal into github, Github renders it properly. The first is rendered as a character. The second is an emoji. The third one is rendered as a character in code blocks, and an emoji elsewhere (except for the face, hmmmm).

❯ "`u{23f1}`u{fe0e} `u{23f1}`u{fe0f} `u{23f1}"
⏱︎ ⏱️ ⏱
❯ "`u{2714}`u{fe0e} `u{2714}`u{fe0f} `u{2714}"
✔︎ ✔️ ✔
❯ "`u{2764}`u{fe0e} `u{2764}`u{fe0f} `u{2764}"
❤︎ ❤️ ❤
❯ "`u{1F610}`u{fe0e} `u{1F610}`u{fe0f} `u{1F610}"
😐︎ 😐️ 😐

❯ "`u{23f1}`u{fe0e} `u{23f1}`u{fe0f} `u{23f1}"
⏱︎ ⏱️ ⏱
❯ "`u{2714}`u{fe0e} `u{2714}`u{fe0f} `u{2714}"
✔︎ ✔️ ✔
❯ "`u{2764}`u{fe0e} `u{2764}`u{fe0f} `u{2764}"
❤︎ ❤️ ❤
❯ "`u{1F610}`u{fe0e} `u{1F610}`u{fe0f} `u{1F610}"
😐︎ 😐️ 😐

@lhecker
Copy link
Member

lhecker commented Aug 5, 2022

If you look closely you can see how the glyph is cut off on both sides in the experimental rendering engine ("AtlasEngine").
I'll likely submit a PR in the future which will make the AtlasEngine look similar to our current standard renderer by downsizing glyphs larger than number of cells allocated for them (1 cell in case of ✔️).

BTW I did this just recently in #13549, which will be part of v1.16.


First: it ignores the value of the selector -- it always renders emoji.

Hmm we're feeding the U+FE0E straight to Direct2D to draw it and I was hoping it would just do the correct thing automatically. I'll check with them whether there's anything we can do differently to fix that.

Second: when there is no selector, it renders the emoji tiny (basically, text size).

I believe this is because of what @j4james said:

Note that U+2714 is defined in Unicode as a neutral character, which is meant to occupy a single column, while U+274C is defined as a wide character, and is meant to occupy two columns.

Additionally, if you notice any weird whitespace gaps around Emojis: That's because of #8000 and I'm currently working on fixing that. 🙂

@Jaykul
Copy link
Contributor

Jaykul commented Aug 5, 2022

First: it ignores the value of the selector -- it always renders emoji.

Hmm we're feeding the U+FE0E straight to Direct2D to draw it and I was hoping it would just do the correct thing automatically. I'll check with them whether there's anything we can do differently to fix that.

I suppose they are. As I noted earlier, the fonts we're all using don't have these characters in them, usually, because the characters don't fit in a single cell -- that is, they're totally indecipherable when scaled down.

I'm concerned about this:

I'll likely submit a PR in the future which will make the AtlasEngine look similar to our current standard renderer by downsizing glyphs larger than number of cells allocated for them (1 cell in case of ✔️).

BTW I did this just recently in #13549, which will be part of v1.16.

I'm not sure about that one ... downsizing emoji to fit in a single character cell is not the right move, in my opinion. isn't that going to result in the super-tiny examples above (the third character in the each row of the screenshot)? I'm not clear, guess I'll have to wait and see -- because if the only options are "shrink" it to one character or "stretch" it to two wide, I think emoji are going to end up uniformly hideous 😉

@lhecker
Copy link
Member

lhecker commented Aug 5, 2022

I'm not sure about that one ... downsizing emoji to fit in a single character cell is not the right move, in my opinion. isn't that going to result in the super-tiny examples above (the third character in the each row of the screenshot)? I'm not clear, guess I'll have to wait and see -- because if the only options are "shrink" it to one character or "stretch" it to two wide, I think emoji are going to end up uniformly hideous 😉

The downsizing I implemented is at least somewhat better at this than our current approach.

DxEngine (current):
image

AtlasEngine (future):
image

We don't want to let glyphs overlap rows for performance and correctness reasons (at the moment, because overlapping glyphs are anything but trivial) and so shrinking large glyphs is an okay-ish and simple solution. For instance this string is about 1.5 rows tall: ⟨င်္က္ကျြွှေို့်ာှီ့ၤဲံ့းႍ⟩

Without scaling:
image

With scaling:
image

(As explained before the extra whitespace is because we assign at least 1 cell to each code point we process. Since that string has 27 code points it's also 27 cells wide. As part of #8000 this situation will be improved though.)

With scaling in general being ultimately necessary, scaling Emojis is basically just a side effect and not actually intentional. Unrelated to the U+FE0E issue, Emojis like U+2714 (❤) without variant selector are fit into a single cell however, simply because that's exactly what other terminals do too (see #2066). So at least that part won't change any time soon. There's also things like #12512 (comment) which I'll probably try to fix as part of #9999.

@Jaykul
Copy link
Contributor

Jaykul commented Aug 8, 2022

Well, I'll have to see what I can do about making my emojis explicitly use the emoji variant. I have to say though -- it seems that if you're going to force the non-variant to a single cell, you really should force the text variant to a single cell too. That's the one where the author meant for it to be a single text character, right?

@DHowett
Copy link
Member

DHowett commented Aug 8, 2022

That the text variant takes up two cells hearkens back to an ages-old issue in the console: FE0F is allocated its own column, because we've never supported zero-width glyphs, and so the grapheme cluster HEART + VARIATION SELECTOR 15 is allocated two columns. :)

Combine that with DirectWrite's propensity for emoji-fying it even when the variation selector is present, and you get a double-width full-color heart. Bah.

@j4james
Copy link
Collaborator

j4james commented Aug 9, 2022

FYI, the issue of variant selectors was discussed at length in #8970. It's a more complicated problem than just supporting zero-width glyphs, because you're changing the width of the character after it has already been output, and original width can affect the layout of the page in ways that are irreversible. Last I checked, most terminals just ignored them (at least regarding the width change), which seems the most sensible thing to do.

@microsoft-github-policy-service microsoft-github-policy-service bot added the In-PR This issue has a related PR label Apr 11, 2023
microsoft-github-policy-service bot pushed a commit that referenced this issue Apr 26, 2023
This is practically a from scratch rewrite of AtlasEngine.

The initial approach used a very classic monospace text renderer, where
the viewport is subdivided into cells and each cell is assigned one
glyph texture, just like how real terminals used to work.
While we knew that it would have problems with overly large glyphs,
like those found in less often used languages, we didn't expect the
absolutely massive number of fonts that this approach would break.
For one, the assumption that monospace fonts are actually mostly
monospace has turned out to be a complete lie and we can't force users
to use better designed fonts. But more importantly, we can't just
design an entire Unicode fallback font collection from scratch where
every major glyph is monospace either. This is especially problematic
for vertical overhangs which are extremely difficult to handle in a
way that outperforms the much simpler alternative approach:
Just implementing a bog-standard, modern, quad-based text renderer.

Such an approach is both, less code and runs faster due to a less
complex CPU-side. The text shaping engine (in our case DirectWrite)
has to resolve text into glyph indices anyways, so using them directly
for text rendering allows reduces the effort of turning it back into
text ranges and hashing those. It's memory overhead is also reduced,
because we can now break up long ligatures into their individual glyphs.
Especially on AMD APUs I found this approach to run much faster.

A list of issues I think are either obsolete (and could be closed)
or resolved with this PR in combination with #14255:

Closes #6864
Closes #6974
Closes #8993
Closes #9940
Closes #10128
Closes #12537
Closes #13064
Closes #13527
Closes #13662
Closes #13700
Closes #13989
Closes #14022
Closes #14057
Closes #14094
Closes #14098
Closes #14117
Closes #14533
Closes #14877

## PR Checklist
* Enabling software rendering enables D2D mode ✅
* Both D2D and D3D:
  * Background appears correctly ✅✅
  * Text appears correctly
    * Cascadia Code Regular ✅✅
    * Cascadia Code Bold ✅✅
    * Cascadia Code Italic ✅✅
    * Cascadia Code block chars leave (almost) no gaps ✅✅
    * Terminus TTF at 13.5pt leaves no gaps between block chars ✅✅
    * ``"`u{e0b2}`u{e0b0}"`` in Fira Code Nerd Font forms a square ✅✅
  * Cursor appears correctly
    * Legacy small/medium/large ✅✅
    * Vertical bar ✅✅
    * Underscore ✅✅
    * Empty box ✅✅
    * Full box ✅✅
    * Double underscore ✅✅
  * Changing the cursor color works ✅✅
  * Selection appears correctly ✅✅
  * Scrolling in various manners always renders correctly ✅✅
  * Changing the text antialising mode works ✅✅
  * Semi-transparent backgrounds work ✅✅
  * Scroll-zooming the font size works ✅✅
  * Double-size characters work ✅✅
  * Resizing while text is printing works ✅✅
  * DWM `+Heatmap_ShowDirtyRegions` shows that only the cursor
    region is dirty when it's blinking ✅✅
* D2D
  * Margins are filled with background color ❌
    They're filled with the neighboring's cell background color for
    convenience, as D2D doesn't support `D3D11_TEXTURE_ADDRESS_BORDER`
* D3D
  * Margins are filled with background color ✅
  * Soft fonts work ✅
  * Custom shaders enable continous redraw if time constant is used ✅
  * Retro shader appears correctly ✅
  * Resizing while a custom shader is running works ✅
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
In-PR This issue has a related PR Issue-Bug It either shouldn't be doing this or needs an investigation. Needs-Tag-Fix Doesn't match tag requirements Priority-3 A description (P3) Product-Terminal The new Windows Terminal.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants