Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenConsole crash when pasting emoji #8663

Closed
amoldeshpande opened this issue Dec 26, 2020 · 18 comments · Fixed by #14745
Closed

OpenConsole crash when pasting emoji #8663

amoldeshpande opened this issue Dec 26, 2020 · 18 comments · Fixed by #14745
Assignees
Labels
Area-Input Related to input processing (key presses, mouse, etc.) In-PR This issue has a related PR Issue-Bug It either shouldn't be doing this or needs an investigation. Needs-Attention The core contributors need to come back around and look at this ASAP. Needs-Tag-Fix Doesn't match tag requirements Priority-1 A description (P1) Product-Conhost For issues in the Console codebase Severity-Crash Crashes are real bad news.

Comments

@amoldeshpande
Copy link

amoldeshpande commented Dec 26, 2020

Environment

Windows build number:  10.0.19041.630]
Windows Terminal version (if applicable):

Any other software?
tcsh (native Windows version)

# Steps to reproduce

download tcsh-x64.exe from https://github.com/amoldeshpande/tcsh/releases/tag/20210103.10 

Launch tcsh.  Doesn't have to be in terminal.  Paste any emoji into it.  It will crash conhost.exe on Windows 10 2004 and OpenConsole.exe if you launch within Windows Terminal

# Expected behavior

Should paste an emoji

# Actual behavior


Edited: repro steps with tcsh. Cannnot repro it with my test program anymore

@ghost ghost added Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Needs-Tag-Fix Doesn't match tag requirements labels Dec 26, 2020
@j4james
Copy link
Collaborator

j4james commented Dec 27, 2020

Irrespective of whether emojis are expected to work in these conditions, it seems very wrong to me that an error condition here would cause the entire process to terminate. I'm convinced that half of the FAIL_FAST calls in the code are unintentional, and the original author was likely expecting an error to be returned, or an exception to be thrown (much like all the places we're mistakenly using gsl::narrow instead of gsl::narrow_cast).

@KalleOlaviNiemitalo
Copy link

I cannot reproduce the crash when right-click pasting 😀 (U+1F600) or 👨‍👩‍👧‍👦 (U+1F468 U+200D U+1F469 U+200D U+1F467 U+200D U+1F466) to OpenConsole.exe copied from Windows Terminal 1.4.3243.0, with the demo program built for x86, on Windows 10 version 20H2 (19042.685), where the setting "Beta: Use Unicode UTF-8 for worldwide language support" is off.

I don't see why SetFileApisToANSI would even matter here; doesn't it only affect the encoding of file names, which the demo program does not use?

@j4james
Copy link
Collaborator

j4james commented Dec 27, 2020

@KalleOlaviNiemitalo I think you need to paste more than 100 characters (the length of the INPUT_RECORD buffer that was passed to ReadConsoleInput) in order to trigger the error. I could reproduce it by pasting the following sequence:

😀💘💥👍🎅🦄🍄🌎🚖⚓🚀⌚💣😀💘💥👍🎅🦄🍄🌎🚖⚓🚀⌚💣😀💘💥👍🎅🦄🍄🌎🚖⚓🚀⌚💣😀💘💥👍🎅🦄🍄🌎🚖⚓🚀⌚💣😀💘💥👍🎅🦄🍄🌎🚖⚓🚀⌚💣

@amoldeshpande
Copy link
Author

I have been using a single 😁 to crash it.

@j4james
Copy link
Collaborator

j4james commented Dec 27, 2020

In that case I can't reproduce it either. What's the commit hash of the revision that you're testing with? And in case it makes any difference, what are the build details - x86 or x64, debug or release? Also, what keyboard locale are you using?

@j4james
Copy link
Collaborator

j4james commented Dec 27, 2020

I just realised that the crash that I was seeing was actually in the DirectReadData::Notify method rather than the _DoGetConsoleInput function, but the code is essentially identical.

This is in DirectReadData::Notify:

for (size_t i = 0; i < _eventReadCount; ++i)
{
if (readEvents.empty())
{
break;
}
_outEvents.push_back(std::move(readEvents.front()));
readEvents.pop_front();
}
// store partial event if necessary
if (!readEvents.empty())
{
_pInputBuffer->StoreReadPartialByteSequence(std::move(readEvents.front()));
readEvents.pop_front();
FAIL_FAST_IF(!(readEvents.empty()));
}

And this is in _DoGetConsoleInput:

for (size_t i = 0; i < eventReadCount; ++i)
{
if (readEvents.empty())
{
break;
}
outEvents.push_back(std::move(readEvents.front()));
readEvents.pop_front();
}
// store partial event if necessary
if (!readEvents.empty())
{
inputBuffer.StoreReadPartialByteSequence(std::move(readEvents.front()));
readEvents.pop_front();
FAIL_FAST_IF(!(readEvents.empty()));
}

Either of those will crash if the number of readEvents is more than 1 greater than the _eventReadCount, and that is easy to achieve with a multibyte codepage. I can't see how the crash could be triggered with a single character, though, unless ReadConsoleInput was called with a much smaller record buffer.

@KalleOlaviNiemitalo
Copy link

If I understand correctly, InputBuffer::_ReadBuffer tries to predict how many UTF-16 input events fit in the specified number of OEM input events, but it assumes the OEM encoding uses at most two bytes per character:

++virtualReadCount;
if (!unicode)
{
if (readEvents.back()->EventType() == InputEventType::KeyEvent)
{
const KeyEvent* const pKeyEvent = static_cast<const KeyEvent* const>(readEvents.back().get());
if (IsGlyphFullWidth(pKeyEvent->GetCharData()))
{
++virtualReadCount;
}
}
}

According to #7589 (comment), the estimated number of bytes per character even depends on the font.

Then, SplitToOem translates the UTF-16 input events to OEM input events. If the configured code page is UTF-8, then it uses CESU-8, which may need three bytes per UTF-16 code unit. _DoGetConsoleInput cannot return the excess events to its caller and does not know how to put them back to the input buffer, either.

If InputBuffer::_ReadBuffer supported UTF-8, then that would avoid the crash, I think. However, the resulting input events would still have CESU-8 rather than UTF-8, unless that is converted in some other layer that I cannot find now.

Also, I wonder if the StoreReadPartialByteSequence call should perhaps be skipped if IsPeek is true.

@amoldeshpande
Copy link
Author

dangit, I can't reproduce it any more with my simple piece of code.

The more involved repro consists of running code in this branch https://github.com/amoldeshpande/tcsh/tree/unicode under Windows Terminal or OpenConsole.exe and pasting the emoji. This is 100% still repro-able with font of "MesloLGM NF" . Have not tried any other fonts IIRC with this shell.

(Nerd Fonts from here https://github.com/ryanoasis/nerd-fonts/releases)

In answer to a previous question, I used an x64 debug build with standard US-EN keyboard layout.

I will keep trying to make my simple repro crash again meanwhile.

@j4james
Copy link
Collaborator

j4james commented Dec 27, 2020

My guess is that there was a mode change involved somewhere (i.e. SetConsoleMode) that persisted from another test case or app. Then when you restarted OpenConsole, the modes went back to the defaults, so the error couldn't be reproduced.

@amoldeshpande
Copy link
Author

amoldeshpande commented Dec 27, 2020


Editing to update repro steps:

download tcsh-x64.exe from https://github.com/amoldeshpande/tcsh/releases/tag/20210103.10 

Launch tcsh.  Doesn't have to be in terminal.  Paste any emoji into it.  It will crash conhost.exe on Windows 10 2004 and OpenConsole.exe if you launch within Windows Terminal

@amoldeshpande amoldeshpande changed the title OpenConsole crash when pasting emoji if SetFileApisToANSI() OpenConsole crash when pasting emoji Jan 15, 2021
@zadjii-msft zadjii-msft added Area-Input Related to input processing (key presses, mouse, etc.) Priority-1 A description (P1) Product-Conhost For issues in the Console codebase Severity-Crash Crashes are real bad news. labels Jan 21, 2021
@zadjii-msft zadjii-msft added this to the Terminal v1.7 milestone Jan 21, 2021
@zadjii-msft zadjii-msft added the Issue-Bug It either shouldn't be doing this or needs an investigation. label Jan 21, 2021
@ghost ghost removed the Needs-Tag-Fix Doesn't match tag requirements label Jan 21, 2021
@zadjii-msft
Copy link
Member

Okay since it looks like there's a repro and a pretty well identified piece of code responsible, I'm yanking triage. Looks like we didn't end up getting to this in 1.7, so I'm just gonna throw in with 2.0. Thanks everyone!

@zadjii-msft zadjii-msft removed the Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting label Feb 26, 2021
@zadjii-msft
Copy link
Member

Hey uh, is anyone still hitting this? I can't get this to repro anymore as of main, in Terminal or in conhost, in cmd or powershell, in codepage 437, 936 or 65001. Maybe this got fixed in the last 9 months?

@zadjii-msft zadjii-msft added the Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something label Aug 24, 2021
@amoldeshpande
Copy link
Author

I can't even compile main because of missing wil/nt_result_macros.h or something, so I have no idea. I have terminal 1.9.1942.0 on Windows 10 and it still crashes 🤷

@ghost ghost added Needs-Attention The core contributors need to come back around and look at this ASAP. and removed Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something labels Aug 24, 2021
@DHowett
Copy link
Member

DHowett commented Aug 24, 2021

You need to update your submodules. git submodule update. 😄

@j4james
Copy link
Collaborator

j4james commented Aug 24, 2021

I can still reproduce a crash on a recentish build of OpenConsole (90ff261), but as I mentioned above, my crash is in DirectReadData::Notify, which isn't exactly the same as the OP.

To reproduce, I'm using this little test app in a conhost cmd shell:

#include <Windows.h>

int main(int argc, char** argv)
{
    HANDLE ih = GetStdHandle(STD_INPUT_HANDLE);

    DWORD mode;
    GetConsoleMode(ih, &mode);
    SetConsoleMode(ih, mode | ENABLE_LINE_INPUT | ENABLE_ECHO_INPUT | ENABLE_PROCESSED_INPUT);

    SetConsoleCP(CP_UTF8);
    SetConsoleOutputCP(CP_UTF8);
    SetFileApisToANSI();

    while (true)
    {
        INPUT_RECORD inputRecord[100];
        DWORD inputCount;
        ReadConsoleInputA(ih, inputRecord, 100, &inputCount);
    }
}

And once it's running I paste a bunch of emojis like this:
😀💘💥👍🎅🦄🍄🌎🚖⚓🚀⌚💣😀💘💥👍🎅🦄🍄🌎🚖⚓🚀⌚💣😀💘💥👍🎅🦄🍄🌎🚖⚓🚀⌚💣😀💘💥👍🎅🦄🍄🌎🚖⚓🚀⌚💣😀💘💥👍🎅🦄🍄🌎🚖⚓🚀⌚💣

@amoldeshpande
Copy link
Author

after an hour of compiling it now claims I don't have a .NET SDK installed. I'm tired of this crap. You can close the bug if you like. The repro I gave is pretty simple (you can use any current release of tcsh) and if it doesn't crash, consider me satisfied.

@zadjii-msft zadjii-msft modified the milestones: Terminal v2.0, 22H1 Jan 4, 2022
@zadjii-msft zadjii-msft modified the milestones: 22H1, Terminal v1.14 Feb 2, 2022
@zadjii-msft
Copy link
Member

Hmmmmmm. So it sounds like we kinda need to do a SplitToOem in InputBuffer::Read, before we even move the records to the Notify call. Or something like a pre-emptive scan in Read, like checking how many events each event will be translated into. I wonder how easy this would be to write a test for...

@zadjii-msft
Copy link
Member

Okay so I think InputBuffer::Read is only called in two places:

  • directio.cpp:189
  • readDataDirect.cpp:135

Both do the same thing. If called with !unicode, then they will call SplitToOem on the events read from the buffer, to expand them out to chars. So I think it's okay to pre-emptively trim the buffer in the !unicode case inside of IB::Read, because we know the caller will be expanding them.

Wow those two functions are really basically the same code aren't they...

@zadjii-msft zadjii-msft self-assigned this Feb 2, 2022
@ghost ghost added the In-PR This issue has a related PR label Feb 3, 2022
@zadjii-msft zadjii-msft modified the milestones: Terminal v1.14, 22H1 Jun 7, 2022
@zadjii-msft zadjii-msft modified the milestones: 22H1, 22H2 Aug 3, 2022
@zadjii-msft zadjii-msft modified the milestones: 22H2, Terminal v1.18 Dec 5, 2022
@microsoft-github-policy-service microsoft-github-policy-service bot added the Needs-Tag-Fix Doesn't match tag requirements label Feb 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Input Related to input processing (key presses, mouse, etc.) In-PR This issue has a related PR Issue-Bug It either shouldn't be doing this or needs an investigation. Needs-Attention The core contributors need to come back around and look at this ASAP. Needs-Tag-Fix Doesn't match tag requirements Priority-1 A description (P1) Product-Conhost For issues in the Console codebase Severity-Crash Crashes are real bad news.
Projects
None yet
5 participants