-
Notifications
You must be signed in to change notification settings - Fork 8.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a deadlock during ConPTY shutdown #14160
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[2:20 PM] Dustin Howett
did yougit blame
the line that introduced the shutdown mutex to see if it fixed a bug?
[2:20 PM] Dustin Howett
if so, can you test against that bug?
Requesting changes during due diligence :)
// ConsoleInputThreadProcWin32 calls VtIo::CreatePseudoWindow, | ||
// which calls CreateWindowExW, which causes a WM_SIZE message. | ||
// In short, this function might be called before _pVtRenderEngine exists. | ||
// See PtySignalInputThread::CreatePseudoWindow(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the window message loop happens on a different thread, what happens if we get a window message for visibility while we are shutting down? we will come in here, lock console, and stall... right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only function that blocks is LockConsole
. If it does, that's fine since RundownAndExit()
will hold the console lock during shutdown until the process exits. No code depends on the window thread to proceed during the shutdown so there's no "circular dependency" as is the case with VtEngine
.
When I suggested that lock, my concern was that the caller of But now that I've read the code a bit closer, I've realized that all we do is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't submit a single annoying nit comment from a week ago lol
// thread can't trigger a CloseOutput and release the | ||
// _pVtRenderEngine out from underneath us. | ||
std::lock_guard<std::mutex> lk(_shutdownLock); | ||
auto& gci = ::Microsoft::Console::Interactivity::ServiceLocator::LocateGlobals().getConsoleInformation(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, i am suddenly worried this is a deadlock factory. We call SetWindowVisibility
off the I/O thread when Terminal says "I got focus." The PseudoWindow itself may also take the console lock to send Terminal messages. Is there a condition that will result in Terminal receiving and processing a message while the console lock is held, and then trying to dump a focus message back into the VtIo pipe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous code already acquired the console lock here. All I did was move the if condition inside the locked area. If we have such a bug then we should have had it before this PR is/was merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's give it a shot then!
@@ -94,7 +93,7 @@ VtInputThread::VtInputThread(_In_ wil::unique_hfile hPipe, | |||
DWORD WINAPI VtInputThread::StaticVtInputThreadProc(_In_ LPVOID lpParameter) | |||
{ | |||
const auto pInstance = reinterpret_cast<VtInputThread*>(lpParameter); | |||
return pInstance->_InputThread(); | |||
pInstance->_InputThread(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI _InputThread
is now [[noreturn]]
because CloseInput
is [[noreturn]]
. Hence, no return
is needed, despite this function "returning" a DWORD
.
Test with WSL calling out to this to run a Windows executable (ensure you receive all output from the Windows binary back to the Linux half and that the process exits correctly). If both are "fine" then this should be "fine". |
Hate that I'm saying this, but you should probably check this change on a OneCore edition as well :| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks fine to me. It's just a matter of the effects. You will want to test the various integrations I listed if you didn't already (building this as conhost.exe and subbing into the various Windows OS images) before merging.
A good way would be to run, like, I'll help L with a onecore build. It can be.. let's say tedious. :D |
I've tested two scenarios now:
In both cases both the old code as well as the new code caused OpenConsole to exit immediately. However one time I had a leftover/hung conhost.exe in headless mode hanging around out of nowhere. The good news is that this is the OS' conhost and not the one I patched via |
I can't get sirep working, but it does tear down properly in onecore (not tested onecoreuap yet); I can't get WSL to exhibit an output truncation bug either. |
Hello @DHowett! Because this pull request has the p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (
|
Problem: * Calling `RundownAndExit` tries to flush out the last frame from `VtEngine` * `VtEngine` indirectly calls `RundownAndExit` if the pipe is gone via `VtIo` * `RundownAndExit` is called by other parts of OpenConsole * `RundownAndExit` must be `[[noreturn]]` for most parts of OpenConsole * `VtIo` itself has a mutex ensuring orderly shutdown * In short, doing a thread safe orderly shutdown requires us to hold both, a lock in `RundownAndExit` and `VtIo` at the same time, but while other parts need a blocking `RundownAndExit`, `VtIo` needs a non-blocking one * Refactoring the code to use optionally non-blocking `RundownAndExit` requires refactoring and might prove to be just as buggy Solution: * Simply don't call `RundownAndExit` in `VtEngine` at all * In case the write pipe breaks: * `VtEngine` will close the handle * The client should notice that their read pipe is broken and close their write pipe sooner or later * Once we notice that our read pipe is broken, we call `RundownAndExit` * `RundownAndExit` might call back into `VtEngine` but without a pipe it won't do anything * In case the read pipe breaks or any other part calls `RundownAndExit`: * We call `RundownAndExit` * `RundownAndExit` might call back into `VtEngine` and depending on whether the write pipe is broken or not it will simply write into it or ignore it Closes #14132 Pretty sure this also applies to #1810 ## Validation Steps Performed * Open 5 tabs and run MSYS2's `bash --login` in each of them * `Enter-VsDevShell` in another tab * Close window * 5 tab processes are killed instantly, 1 after ~3s ✅ * Replace conhost with OpenConsole via sfpcopy * Launch Dozens of Git Bash tabs in VS Code * Close them randomly * Remaining ones still work, processes are gone ✅ (cherry picked from commit 1774cfd) Service-Card-Id: 86174637 Service-Version: 1.16
Problem: * Calling `RundownAndExit` tries to flush out the last frame from `VtEngine` * `VtEngine` indirectly calls `RundownAndExit` if the pipe is gone via `VtIo` * `RundownAndExit` is called by other parts of OpenConsole * `RundownAndExit` must be `[[noreturn]]` for most parts of OpenConsole * `VtIo` itself has a mutex ensuring orderly shutdown * In short, doing a thread safe orderly shutdown requires us to hold both, a lock in `RundownAndExit` and `VtIo` at the same time, but while other parts need a blocking `RundownAndExit`, `VtIo` needs a non-blocking one * Refactoring the code to use optionally non-blocking `RundownAndExit` requires refactoring and might prove to be just as buggy Solution: * Simply don't call `RundownAndExit` in `VtEngine` at all * In case the write pipe breaks: * `VtEngine` will close the handle * The client should notice that their read pipe is broken and close their write pipe sooner or later * Once we notice that our read pipe is broken, we call `RundownAndExit` * `RundownAndExit` might call back into `VtEngine` but without a pipe it won't do anything * In case the read pipe breaks or any other part calls `RundownAndExit`: * We call `RundownAndExit` * `RundownAndExit` might call back into `VtEngine` and depending on whether the write pipe is broken or not it will simply write into it or ignore it Closes #14132 Pretty sure this also applies to #1810 ## Validation Steps Performed * Open 5 tabs and run MSYS2's `bash --login` in each of them * `Enter-VsDevShell` in another tab * Close window * 5 tab processes are killed instantly, 1 after ~3s ✅ * Replace conhost with OpenConsole via sfpcopy * Launch Dozens of Git Bash tabs in VS Code * Close them randomly * Remaining ones still work, processes are gone ✅ (cherry picked from commit 1774cfd) Service-Card-Id: 86178271 Service-Version: 1.15
🎉 Handy links: |
#14160 didn't fix #14132 entirely. There seems to be a race condition left where (on my system) 9 out of 10 times everything works correctly, but sometimes OpenConsole exits, while pwsh and bash keep running. My leading theory is that the new code is exiting OpenConsole faster than the old code. This prevents clients from calling console APIs, etc. causing them to get stuck. The old code (and new code) calls `ExitProcess` when the ConPTY pipes break and I think this is wrong: In conhost when you close the window we only call `CloseConsoleProcessState` via the `WM_CLOSE` event and that's it. Solution: Remove the call to `RundownAndExit` for ConPTY. During testing I found that continuously printing text inside msys2 will cause child processes to only exit slowly one by one every 5 seconds. This happens because `CloseConsoleProcessState` calls `HandleCtrlEvent` without holding the console lock. This creates a race condition where most of the time the console IO thread is the one picking up the `CTRL_CLOSE_EVENT`. But that's problematic because the `CTRL_CLOSE_EVENT` leads to a `ConsoleControl` call of type `ConsoleEndTask` which calls back into conhost's IO thread and so you got the IO thread waiting on itself to respond. Solution: Don't race conditions. ## Validation Steps Performed * `Enter-VsDevShell` and close the tab Everything exits after 5s ✅ * Run msys2 bash from within pwsh and close the tab Everything exits instantly ✅ * Run `cat bigfile.txt` and close the tab Everything exits instantly ✅ * Patch `conhost.exe` with `sfpcopy`, as well as `KernelBase.dll` with the recent changes to `winconpty`, then launch and exit shells and applications via VS Code's terminal ✅ * On the main branch without this modification remove the call to `TriggerTeardown` in `RundownAndExit` (this speeds up the shutdown). Run (msys2's) `bash.exe --login` and hold enter and then press Ctrl+Shift+W simultaneously. The tab should close and randomly OpenConsole should exit early while pwsh/bash keep running. Then retry this with this branch and observe how the child processes don't stick around forever anymore. ✅
Problem: * Calling `RundownAndExit` tries to flush out the last frame from `VtEngine` * `VtEngine` indirectly calls `RundownAndExit` if the pipe is gone via `VtIo` * `RundownAndExit` is called by other parts of OpenConsole * `RundownAndExit` must be `[[noreturn]]` for most parts of OpenConsole * `VtIo` itself has a mutex ensuring orderly shutdown * In short, doing a thread safe orderly shutdown requires us to hold both, a lock in `RundownAndExit` and `VtIo` at the same time, but while other parts need a blocking `RundownAndExit`, `VtIo` needs a non-blocking one * Refactoring the code to use optionally non-blocking `RundownAndExit` requires refactoring and might prove to be just as buggy Solution: * Simply don't call `RundownAndExit` in `VtEngine` at all * In case the write pipe breaks: * `VtEngine` will close the handle * The client should notice that their read pipe is broken and close their write pipe sooner or later * Once we notice that our read pipe is broken, we call `RundownAndExit` * `RundownAndExit` might call back into `VtEngine` but without a pipe it won't do anything * In case the read pipe breaks or any other part calls `RundownAndExit`: * We call `RundownAndExit` * `RundownAndExit` might call back into `VtEngine` and depending on whether the write pipe is broken or not it will simply write into it or ignore it Closes #14132 Pretty sure this also applies to #1810 ## Validation Steps Performed * Open 5 tabs and run MSYS2's `bash --login` in each of them * `Enter-VsDevShell` in another tab * Close window * 5 tab processes are killed instantly, 1 after ~3s ✅ * Replace conhost with OpenConsole via sfpcopy * Launch Dozens of Git Bash tabs in VS Code * Close them randomly * Remaining ones still work, processes are gone ✅
Problem: * Calling `RundownAndExit` tries to flush out the last frame from `VtEngine` * `VtEngine` indirectly calls `RundownAndExit` if the pipe is gone via `VtIo` * `RundownAndExit` is called by other parts of OpenConsole * `RundownAndExit` must be `[[noreturn]]` for most parts of OpenConsole * `VtIo` itself has a mutex ensuring orderly shutdown * In short, doing a thread safe orderly shutdown requires us to hold both, a lock in `RundownAndExit` and `VtIo` at the same time, but while other parts need a blocking `RundownAndExit`, `VtIo` needs a non-blocking one * Refactoring the code to use optionally non-blocking `RundownAndExit` requires refactoring and might prove to be just as buggy Solution: * Simply don't call `RundownAndExit` in `VtEngine` at all * In case the write pipe breaks: * `VtEngine` will close the handle * The client should notice that their read pipe is broken and close their write pipe sooner or later * Once we notice that our read pipe is broken, we call `RundownAndExit` * `RundownAndExit` might call back into `VtEngine` but without a pipe it won't do anything * In case the read pipe breaks or any other part calls `RundownAndExit`: * We call `RundownAndExit` * `RundownAndExit` might call back into `VtEngine` and depending on whether the write pipe is broken or not it will simply write into it or ignore it Closes #14132 Pretty sure this also applies to #1810 ## Validation Steps Performed * Open 5 tabs and run MSYS2's `bash --login` in each of them * `Enter-VsDevShell` in another tab * Close window * 5 tab processes are killed instantly, 1 after ~3s ✅ * Replace conhost with OpenConsole via sfpcopy * Launch Dozens of Git Bash tabs in VS Code * Close them randomly * Remaining ones still work, processes are gone ✅ (cherry picked from commit 1774cfd)
#14160 didn't fix #14132 entirely. There seems to be a race condition left where (on my system) 9 out of 10 times everything works correctly, but sometimes OpenConsole exits, while pwsh and bash keep running. My leading theory is that the new code is exiting OpenConsole faster than the old code. This prevents clients from calling console APIs, etc. causing them to get stuck. The old code (and new code) calls `ExitProcess` when the ConPTY pipes break and I think this is wrong: In conhost when you close the window we only call `CloseConsoleProcessState` via the `WM_CLOSE` event and that's it. Solution: Remove the call to `RundownAndExit` for ConPTY. During testing I found that continuously printing text inside msys2 will cause child processes to only exit slowly one by one every 5 seconds. This happens because `CloseConsoleProcessState` calls `HandleCtrlEvent` without holding the console lock. This creates a race condition where most of the time the console IO thread is the one picking up the `CTRL_CLOSE_EVENT`. But that's problematic because the `CTRL_CLOSE_EVENT` leads to a `ConsoleControl` call of type `ConsoleEndTask` which calls back into conhost's IO thread and so you got the IO thread waiting on itself to respond. Solution: Don't race conditions. ## Validation Steps Performed * `Enter-VsDevShell` and close the tab Everything exits after 5s ✅ * Run msys2 bash from within pwsh and close the tab Everything exits instantly ✅ * Run `cat bigfile.txt` and close the tab Everything exits instantly ✅ * Patch `conhost.exe` with `sfpcopy`, as well as `KernelBase.dll` with the recent changes to `winconpty`, then launch and exit shells and applications via VS Code's terminal ✅ * On the main branch without this modification remove the call to `TriggerTeardown` in `RundownAndExit` (this speeds up the shutdown). Run (msys2's) `bash.exe --login` and hold enter and then press Ctrl+Shift+W simultaneously. The tab should close and randomly OpenConsole should exit early while pwsh/bash keep running. Then retry this with this branch and observe how the child processes don't stick around forever anymore. ✅ (cherry picked from commit b6b1ff8) Service-Card-Id: 87207831 Service-Version: 1.15
#14160 didn't fix #14132 entirely. There seems to be a race condition left where (on my system) 9 out of 10 times everything works correctly, but sometimes OpenConsole exits, while pwsh and bash keep running. My leading theory is that the new code is exiting OpenConsole faster than the old code. This prevents clients from calling console APIs, etc. causing them to get stuck. The old code (and new code) calls `ExitProcess` when the ConPTY pipes break and I think this is wrong: In conhost when you close the window we only call `CloseConsoleProcessState` via the `WM_CLOSE` event and that's it. Solution: Remove the call to `RundownAndExit` for ConPTY. During testing I found that continuously printing text inside msys2 will cause child processes to only exit slowly one by one every 5 seconds. This happens because `CloseConsoleProcessState` calls `HandleCtrlEvent` without holding the console lock. This creates a race condition where most of the time the console IO thread is the one picking up the `CTRL_CLOSE_EVENT`. But that's problematic because the `CTRL_CLOSE_EVENT` leads to a `ConsoleControl` call of type `ConsoleEndTask` which calls back into conhost's IO thread and so you got the IO thread waiting on itself to respond. Solution: Don't race conditions. ## Validation Steps Performed * `Enter-VsDevShell` and close the tab Everything exits after 5s ✅ * Run msys2 bash from within pwsh and close the tab Everything exits instantly ✅ * Run `cat bigfile.txt` and close the tab Everything exits instantly ✅ * Patch `conhost.exe` with `sfpcopy`, as well as `KernelBase.dll` with the recent changes to `winconpty`, then launch and exit shells and applications via VS Code's terminal ✅ * On the main branch without this modification remove the call to `TriggerTeardown` in `RundownAndExit` (this speeds up the shutdown). Run (msys2's) `bash.exe --login` and hold enter and then press Ctrl+Shift+W simultaneously. The tab should close and randomly OpenConsole should exit early while pwsh/bash keep running. Then retry this with this branch and observe how the child processes don't stick around forever anymore. ✅ (cherry picked from commit b6b1ff8) Service-Card-Id: 87207832 Service-Version: 1.16
🎉 Handy links: |
2 new ConPTY APIs were added as part of this commit: * `ClosePseudoConsoleTimeout` Complements `ClosePseudoConsole`, allowing users to override the `INFINITE` wait for OpenConsole to exit with any arbitrary timeout, including 0. * `ConptyReleasePseudoConsole` This releases the `\Reference` handle held by the `HPCON`. While it makes launching further PTY clients via `PROC_THREAD_ATTRIBUTE_PSEUDOCONSOLE` impossible, it does allow conhost/OpenConsole to exit naturally once all console clients have disconnected. This makes it unnecessary to having to monitor the exit of the spawned shell/application, just to then call `ClosePseudoConsole`, while carefully continuing to read from the output pipe. Instead users can now just read from the output pipe until it is closed, knowing for sure that no more data will come or clients connect. This is especially useful in combination with `ClosePseudoConsoleTimeout` and a timeout of 0, to allow conhost/OpenConsole to exit asynchronously. These new APIs are used to fix an array of bugs around Windows Terminal exiting either too early or too late. They also make usage of the ConPTY API simpler in most situations (when spawning a single application and waiting for it to exit). Depends on #13882, #14041, #14160, #14282 Closes #4564 Closes #14416 Closes MSFT-42244182 ## Validation Steps Performed * Calling `FreeConsole` in a handoff'd application closes the tab ✅ * Create a .bat file containing only `start /B cmd.exe`. If WT stable is the default terminal the tab closes instantly ✅ With these changes included the tab stays open with a cmd.exe prompt ✅ * New ConPTY tests ✅
Problem:
RundownAndExit
tries to flush out the last frame fromVtEngine
VtEngine
indirectly callsRundownAndExit
if the pipe is gone viaVtIo
RundownAndExit
is called by other parts of OpenConsoleRundownAndExit
must be[[noreturn]]
for most parts of OpenConsoleVtIo
itself has a mutex ensuring orderly shutdowna lock in
RundownAndExit
andVtIo
at the same time, but while other partsneed a blocking
RundownAndExit
,VtIo
needs a non-blocking oneRundownAndExit
requires refactoring and might prove to be just as buggy
Solution:
RundownAndExit
inVtEngine
at allVtEngine
will close the handleclose their write pipe sooner or later
RundownAndExit
RundownAndExit
might call back intoVtEngine
butwithout a pipe it won't do anything
RundownAndExit
:RundownAndExit
RundownAndExit
might call back intoVtEngine
and depending on whetherthe write pipe is broken or not it will simply write into it or ignore it
Closes #14132
Pretty sure this also applies to #1810
Validation Steps Performed
bash --login
in each of themEnter-VsDevShell
in another tab