Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SetConsoleActiveScreenBuffer breaks Unicode output #17580

Open
alabuzhev opened this issue Jul 18, 2024 · 7 comments
Open

SetConsoleActiveScreenBuffer breaks Unicode output #17580

alabuzhev opened this issue Jul 18, 2024 · 7 comments
Labels
Issue-Bug It either shouldn't be doing this or needs an investigation. Needs-Tag-Fix Doesn't match tag requirements Priority-3 A description (P3) Product-Conpty For console issues specifically related to conpty
Milestone

Comments

@alabuzhev
Copy link
Contributor

alabuzhev commented Jul 18, 2024

Windows Terminal version

1.22.1972 and newer

Windows build number

10.0.19045.4412

Other Software

No

Steps to reproduce

  1. Compile this simple program:
#include <windows.h>

int main()
{
    const auto Output = GetStdHandle(STD_OUTPUT_HANDLE);
    DWORD n;
    WriteConsole(Output, L"𠜎𠝹𠱓𠱸𠲖𠳏𠳕𠴕𠵼𠵿𠸎\n", 23, &n, {});
    WriteConsole(Output, L"💖💫🐶🐇🌎🏃\n", 13, &n, {});
    MessageBox({}, L"Press OK", {}, 0);
    const auto NewBuffer = CreateConsoleScreenBuffer(GENERIC_READ | GENERIC_WRITE, FILE_SHARE_READ | FILE_SHARE_WRITE, {}, CONSOLE_TEXTMODE_BUFFER, {});
    SetConsoleActiveScreenBuffer(NewBuffer);
    SetConsoleActiveScreenBuffer(Output);
    MessageBox({}, L"See?", {}, 0);
}
  1. Run it.
  2. Follow the instructions.

Expected Behavior

The text stays intact after pressing OK (and calling SetConsoleActiveScreenBuffer(Output);):

image

Actual Behavior

The text is broken, as if a bad conversion happens somewhere:

image

Observations

  • It stopped working somewhere between WT 1.22.1511.0 (1.22.240530001, 31 May) and 1.22.1972.0 (1.22.240715002, 17 Jul). Unfortunately I don't have any other versions in between.
  • OpenConsole is not affected, only WT.
@alabuzhev alabuzhev added Issue-Bug It either shouldn't be doing this or needs an investigation. Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting labels Jul 18, 2024
@lhecker
Copy link
Member

lhecker commented Jul 18, 2024

The recent canary builds include #17510 as an experiment, which is a complete rewrite of how ConPTY works. It now doesn't modify VT output in any way anymore and the translation from console APIs to VT is now direct and strictly "algorithmic" within the console API call handler. This fixes all those "VT output is broken" bugs, and also improves performance ~15x. (If you had performance issues before, you probably

I removed support for surrogate pairs for the console APIs intentionally, as this allows us to move to an architecture that's simpler, more robust, and more performant in the long term. This is because if we restrict console APIs (the non-VT ones) to UCS-2 we can use the CHAR_INFO based APIs as the basis with which we can abstract all other console calls. ScrollConsoleScreenBuffer for instance is basically nothing but a series of ReadConsoleOutput and WriteConsoleOutput calls.

Since VT doesn't have multiple buffers like the console API, I've written SetConsoleActiveScreenBuffer as a call to ReadConsoleOutput followed by a serialization of the CHAR_INFO structs to VT (which we need anyway to support WriteConsoleOutput). I've assumed that this is alright since historically the console APIs have only supported UCS-2 until very recently.

It's definitely possible to restore support for surrogate pairs, but I'd strongly prefer if we didn't have to. Otherwise, we would need to translate "CHAR_INFO_UTF32" structs to CHAR_INFO ones on input/output. Support for grapheme clusters would be even more difficult, but that's what Windows Terminal now supports.

How much does this change impact you? Does this break anything in Far? If you have any feedback, etc., please let me know!

@lhecker
Copy link
Member

lhecker commented Jul 18, 2024

I've just added an early-return to SetConsoleActiveScreenBuffer that will avoid doing anything if the given handle is identical to the currently active one: 21b13ca
I think your intention was to show the general issue with SetConsoleActiveScreenBuffer, but thank you so much anyway for showing me that I forgot such an early return!

This will unfortunately break your example code though and it would need to be updated by switching between two different buffers.

@alabuzhev
Copy link
Contributor Author

I've written SetConsoleActiveScreenBuffer as a call to ReadConsoleOutput calls.

Waitaminute... Aren't that officially legacy and "no longer a part of ecosystem roadmap" and "not recommended in new products", according to MSDN banners? 😄
I understand the "implementing legacy in terms of other legacy" logic, but, notably, there are no CHAR_INFOs anywhere in SetConsoleActiveScreenBuffer's signature, so from the outside POV it's rather unexpected that it ought to suffer from ReadConsoleOutput's limitations.

It's definitely possible to restore support for surrogate pairs, but I'd strongly prefer if we didn't have to

I'd argue that at least the case "a single non-BMP character occupying exactly two cells can be losslessly read into a pair" is worth supporting - it covers a huge range of CJK and something is better than nothing, but that's a different topic. Also, it won't help with extended colors and styles, which are also broken by switching buffers now.

Does this break anything in Far?

That's... why I'm here.
Technically yes. Say, a user could choose to have RGB colors and files with non-BMP names on panels. We also have a log sink that sinks logs into a new buffer that's available on demand (via SetConsoleActiveScreenBuffer). Switching there and back again would uglify everything until explicitly redrawn. Having said that, extra buffers in WT are borderline unusable, so I won't claim that the impact is huge.

I think your intention was to show the general issue with SetConsoleActiveScreenBuffer, but thank you so much anyway for showing me that I forgot such an early return!

Thanks, it's a totally reasonable optimization and I haven't suggested it myself exactly not to draw attention away from a bigger issue 😄
This also mostly fixes the issue I initially observed:

It turns out that we call SetConsoleActiveScreenBuffer after executing external apps - an app can switch to some other buffer for reasons and, say, crash before restoring the status quo, potentially leaving the shell in an unusable state, so we do it just in case. In other words, it is probably no-op 99.999% of the time.

Thinking of it, can't SetConsoleActiveScreenBuffer use something like DECCRA internally too instead of ReadConsoleOutput?

@mominshaikhdevs

This comment has been minimized.

@lhecker
Copy link
Member

lhecker commented Jul 24, 2024

Aren't that officially legacy and "no longer a part of ecosystem roadmap" and "not recommended in new products", according to MSDN banners? 😄

To be fair, the same banner exists on the SetConsoleActiveScreenBuffer page, so I feel like it's fine if using that API results in a worse (modern) Unicode correctness.

I understand the "implementing legacy in terms of other legacy" logic, but, notably, there are no CHAR_INFOs anywhere in SetConsoleActiveScreenBuffer's signature, so from the outside POV it's rather unexpected that it ought to suffer from ReadConsoleOutput's limitations.

That's true and we could make this work. However, I'd prefer if we could avoid doing that for a year or two, until after I completed some architectural changes for conhost.¹ Would this be alright with you, now that SetConsoleActiveScreenBuffer will be a no-op when the active buffer hasn't changed? I'd of course make it work earlier if this becomes a blocking/major issue for you or anyone else.

Thinking of it, can't SetConsoleActiveScreenBuffer use something like DECCRA internally too instead of ReadConsoleOutput?

I think we could try that long term, but not initially. ConPTY needs to also work with terminals that don't support paging operations (PPA, PPR, etc.) after all. I think it's important that we gain some experience with this "fallback code" first.


¹ The changes are explained in detail in #17387. But basically, my goal is to make Windows Terminal its own proper console server, just like conhost is now.

To do so I'm creating an API that can represent all console APIs without loss of any information. SetConsoleActiveScreenBuffer for instance would not be mapped to VT with this API. This would allow Windows Terminal to properly support all console APIs, including switching screen buffers and not losing any complex Unicode, setting the 16 base colors with SetConsoleScreenBufferInfoEx, and so on.

ConPTY would also be implemented on top of this API, but it would necessarily lose some of the information, because it's output will still be strictly VT based. However, since such a ConPTY would only use its text buffer to serve ReadConsoleOutput calls, its text buffer could theoretically be a simple CHAR_INFO[height][width] matrix (= well over 10k lines less code needed). I'm not entirely sure if we'd ever do that (= 2 buffer implementations = also bad), but I think it'd be nice to have that option. I figured the best way to discover if we have the option is by breaking it now and seeing if it results in any issues. It did in your case, but as it turns out we could solve it with the no-op early return. So, if possible, I'd like to wait again and see if there's another issue.

@alabuzhev
Copy link
Contributor Author

Would this be alright with you, now that SetConsoleActiveScreenBuffer will be a no-op when the active buffer hasn't changed?

Of course - as I mentioned, I noticed only because it wasn't a no-op and because I was intentionally checking the correctness of a piece, not supported by ReadConsoleOutput.
In normal circumstances it shouldn't matter that much.

@lhecker lhecker added Product-Conpty For console issues specifically related to conpty Priority-3 A description (P3) and removed Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting labels Jul 24, 2024
@lhecker lhecker added this to the Backlog milestone Jul 24, 2024
@microsoft-github-policy-service microsoft-github-policy-service bot added the Needs-Tag-Fix Doesn't match tag requirements label Jul 24, 2024
@alabuzhev
Copy link
Contributor Author

This will unfortunately break your example code though and it would need to be updated by switching between two different buffers

Thanks, updated it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue-Bug It either shouldn't be doing this or needs an investigation. Needs-Tag-Fix Doesn't match tag requirements Priority-3 A description (P3) Product-Conpty For console issues specifically related to conpty
Projects
None yet
Development

No branches or pull requests

3 participants