Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional instances of WT crash on exit #15410

Closed
alabuzhev opened this issue May 23, 2023 · 10 comments · Fixed by #15424 or #15451
Closed

Additional instances of WT crash on exit #15410

alabuzhev opened this issue May 23, 2023 · 10 comments · Fixed by #15424 or #15451
Labels
In-PR This issue has a related PR Issue-Bug It either shouldn't be doing this or needs an investigation. Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting

Comments

@alabuzhev
Copy link
Contributor

alabuzhev commented May 23, 2023

Windows Terminal version

1.18.1421.0

Windows build number

10.0.19045.2728

Other Software

N/A

Steps to reproduce

This is probably related to

All instances of Windows Terminal now run in the same process (#14843)

  1. Run WT.
  2. Run WT again.

Expected Behavior

Two WT windows

Actual Behavior

Two WT windows and an extra error dialog:

image

image

Stack trace:

>	twinapi.appcore.dll!Microsoft::WRL::ActivationFactory<struct Microsoft::WRL::Implements<struct Microsoft::WRL::RuntimeClassFlags<3>,struct Windows::ApplicationModel::Core::ICoreApplication,struct Windows::ApplicationModel::Core::ICoreApplication2,struct Windows::ApplicationModel::Core::ICoreApplication3,struct Windows::ApplicationModel::Core::ICoreApplicationExit,struct Windows::ApplicationModel::Core::ICoreImmersiveApplication,struct Windows::ApplicationModel::Core::ICoreImmersiveApplication2,struct Windows::ApplicationModel::Core::ICoreImmersiveApplication3,struct Windows::ApplicationModel::Core::ICoreApplicationUnhandledError,struct Microsoft::WRL::Implements<struct Microsoft::WRL::RuntimeClassFlags<3>,struct Microsoft::WRL::CloakedIid<struct Windows::ApplicationModel::Core::ICoreApplicationUseCount>,struct Microsoft::WRL::CloakedIid<struct IServiceProvider>,struct Microsoft::WRL::CloakedIid<struct IObjectWithSite>,struct Microsoft::WRL::CloakedIid<struct Windows::ApplicationModel::Infrastructure::ICoreApplicationInitializat�()	Unknown	Symbols loaded.
 	Windows.UI.Xaml.dll!ctl::ComPtr<Windows::Foundation::Collections::VectorChangedEventHandler<IInspectable *>>::InternalRelease() Line 346	C++	Symbols loaded.
 	[Inline Frame] Windows.UI.Xaml.dll!ctl::ComPtr<Windows::ApplicationModel::Core::ICoreApplication2>::{dtor}() Line 345	C++	Symbols loaded.
 	Windows.UI.Xaml.dll!DirectUI::FrameworkApplication::HookBackgroundActivationEvents(bool fRegister=false) Line 1233	C++	Symbols loaded.
 	Windows.UI.Xaml.dll!DirectUI::FrameworkApplication::~FrameworkApplication() Line 591	C++	Symbols loaded.
 	Windows.UI.Xaml.dll!ctl::ComObject<DirectUI::FrameworkApplication>::`scalar deleting destructor'(unsigned int)	C++	Symbols loaded.
 	Windows.UI.Xaml.dll!ctl::ComBase::ReleaseImpl() Line 307	C++	Symbols loaded.
 	[Inline Frame] TerminalApp.dll!winrt::Windows::Foundation::IUnknown::release_ref() Line 2161	C++	Symbols loaded.
 	[Inline Frame] TerminalApp.dll!winrt::Windows::Foundation::IUnknown::{dtor}() Line 2066	C++	Symbols loaded.
 	TerminalApp.dll!winrt::impl::root_implements<winrt::TerminalApp::implementation::PaletteItemTemplateSelector,winrt::TerminalApp::PaletteItemTemplateSelector,winrt::composing,winrt::Windows::UI::Xaml::Controls::IDataTemplateSelectorOverrides,winrt::Windows::UI::Xaml::Controls::IDataTemplateSelectorOverrides2>::~root_implements<winrt::TerminalApp::implementation::PaletteItemTemplateSelector,winrt::TerminalApp::PaletteItemTemplateSelector,winrt::composing,winrt::Windows::UI::Xaml::Controls::IDataTemplateSelectorOverrides,winrt::Windows::UI::Xaml::Controls::IDataTemplateSelectorOverrides2>() Line 7527	C++	Symbols loaded.
 	TerminalApp.dll!winrt::TerminalApp::implementation::App::`scalar deleting destructor'(unsigned int)	C++	Non-user code. Symbols loaded.
 	TerminalApp.dll!winrt::impl::produce_base<winrt::TerminalApp::implementation::App,winrt::TerminalApp::IApp,void>::Release() Line 7152	C++	Symbols loaded.
 	Windows.UI.Xaml.dll!ctl::release_interface<DirectUI::FrameworkApplication>(DirectUI::FrameworkApplication * & pInterface=0x000002c37f81c600) Line 137	C++	Non-user code. Symbols loaded.
 	[Inline Frame] Windows.UI.Xaml.dll!DirectUI::FrameworkApplication::GlobalDeinit() Line 668	C++	Non-user code. Symbols loaded.
 	Windows.UI.Xaml.dll!DeinitializeDll() Line 319	C++	Non-user code. Symbols loaded.
 	Windows.UI.Xaml.dll!DllMain(HINSTANCE__ * hinstDLL, unsigned int fdwReason, void * __formal) Line 394	C++	Non-user code. Symbols loaded.
 	Windows.UI.Xaml.dll!dllmain_dispatch(HINSTANCE__ * const instance=0x00007ff919460000, const unsigned long reason=0, void * const reserved=0x0000000000000001) Line 200	C++	Non-user code. Symbols loaded.
 	ntdll.dll!LdrpCallInitRoutine()	Unknown	Non-user code. Symbols loaded without source information.
 	ntdll.dll!LdrShutdownProcess�()	Unknown	Non-user code. Symbols loaded without source information.
 	ntdll.dll!RtlExitUserProcess()	Unknown	Non-user code. Symbols loaded without source information.
 	kernel32.dll!ExitProcessImplementation�()	Unknown	Non-user code. Symbols loaded without source information.
 	ucrtbase.dll!exit_or_terminate_process()	Unknown	Non-user code. Symbols loaded without source information.
 	ucrtbase.dll!common_exit()	Unknown	Non-user code. Symbols loaded without source information.
 	WindowsTerminal.exe!__scrt_common_main_seh() Line 295	C++	Non-user code. Symbols loaded.
 	kernel32.dll!BaseThreadInitThunk�()	Unknown	Non-user code. Symbols loaded without source information.
 	ntdll.dll!RtlUserThreadStart�()	Unknown	Non-user code. Symbols loaded without source information.

After closing the error dialog everything works normally.

@alabuzhev alabuzhev added Issue-Bug It either shouldn't be doing this or needs an investigation. Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting labels May 23, 2023
@zadjii-msft
Copy link
Member

Hmmmmmm. When this happens, are there two windowsterminal.exe processes running in Task Manager? Any ideas which the error is coming from?

@lhecker
Copy link
Member

lhecker commented May 23, 2023

I wonder if we need to pump the message loop on the main thread similarly to how we do it for each window thread. This does seem to be an after effect of us calling detach_abi on the _app though... 🤔

@alabuzhev
Copy link
Contributor Author

are there two windowsterminal.exe processes running in Task Manager?

Three actually.

After the first run:

image

After the second run:

image

Any ideas which the error is coming from?

The second parent (PID 7528 on the screenshot). At least it is the one VS attaches to.

@alabuzhev
Copy link
Contributor Author

P.S. I think Windows doesn't show the "app has stopped working" dialog by default these days, maybe this is why it got overlooked. The error should still be in the event log under "administrative events" and the the dialog can be re-enabled in group policy.

@zadjii-msft
Copy link
Member

Huh. So the wt that handed off to the main instance, crashed when it died (of natural causes). And it crashed... dtoring our App? And what the heck is that TerminalApp.dll!winrt::impl::root_implements<winrt::TerminalApp::implementation::PaletteItemTemplateSelector frame doing there?

Maybe this regressed in #15397. But I almost don't think that's right - we shouldn't have even made a window in that process which we could leak.

I think Windows doesn't show the "app has stopped working" dialog by default these days, maybe this is why it got overlooked

FWIW I pretty much always run with "post-mortem debugging" enabled for WinDbg. So usually we catch things like this.

@zadjii-msft
Copy link
Member

zadjii-msft commented May 24, 2023

Hmm. I can repro this in a VM - even on the build with the leak taken out. So it's not that.

er, no, this was useless

There's a stashed exception in there:

  Name Value Type
$exceptionstack [9 Frames, Microsoft.Terminal.Remoting.dll!winrt::impl::consume_Microsoft_Terminal_Remoting_IMonarchwinrt::Microsoft::Terminal::Remoting::IMonarch::ProposeCommandline(const winrt::Microsoft::Terminal::Remoting::CommandlineArgs & args) Line 122]  
  [0] [External Code] void*
  [1] Microsoft.Terminal.Remoting.dll!winrt::impl::consume_Microsoft_Terminal_Remoting_IMonarchwinrt::Microsoft::Terminal::Remoting::IMonarch::ProposeCommandline(const winrt::Microsoft::Terminal::Remoting::CommandlineArgs & args) Line 122 void*
  [2] Microsoft.Terminal.Remoting.dll!winrt::Microsoft::Terminal::Remoting::implementation::WindowManager::_proposeToMonarch(const winrt::Microsoft::Terminal::Remoting::CommandlineArgs & args) Line 272 void*
  [3] Microsoft.Terminal.Remoting.dll!winrt::Microsoft::Terminal::Remoting::implementation::WindowManager::ProposeCommandline(const winrt::Microsoft::Terminal::Remoting::CommandlineArgs & args, const bool isolatedMode) Line 135 void*
  [4] Microsoft.Terminal.Remoting.dll!winrt::impl::producewinrt::Microsoft::Terminal::Remoting::implementation::WindowManager,winrt::Microsoft::Terminal::Remoting::IWindowManager::ProposeCommandline(void * args, bool isolatedMode, void * * result) Line 1947 void*
  [5] WindowsTerminal.exe!winrt::impl::consume_Microsoft_Terminal_Remoting_IWindowManagerwinrt::Microsoft::Terminal::Remoting::IWindowManager::ProposeCommandline(const winrt::Microsoft::Terminal::Remoting::CommandlineArgs & args, bool isolatedMode) Line 752 void*
  [6] WindowsTerminal.exe!WindowEmperor::HandleCommandlineArgs() Line 117 void*
  [7] WindowsTerminal.exe!wWinMain(HINSTANCE__ * formal, HINSTANCE * __formal, wchar_t * __formal, int __formal) Line 118 void*
  [8] [External Code] void*

@zadjii-msft
Copy link
Member

I hate it so much

diff --git a/src/cascadia/WindowsTerminal/WindowEmperor.cpp b/src/cascadia/WindowsTerminal/WindowEmperor.cpp
index a268dfd59..1bb8b2711 100644
--- a/src/cascadia/WindowsTerminal/WindowEmperor.cpp
+++ b/src/cascadia/WindowsTerminal/WindowEmperor.cpp
@@ -41,10 +41,30 @@ WindowEmperor::WindowEmperor() noexcept :

 WindowEmperor::~WindowEmperor()
 {
+    if (_app)^M
     _app.Close();
     _app = nullptr;
 }

+static bool IsWindows11()^M
+{^M
+    static const bool isWindows11 = []() {^M
+        OSVERSIONINFOEXW osver{};^M
+        osver.dwOSVersionInfoSize = sizeof(osver);^M
+        osver.dwBuildNumber = 22000;^M
+^M
+        DWORDLONG dwlConditionMask = 0;^M
+        VER_SET_CONDITION(dwlConditionMask, VER_BUILDNUMBER, VER_GREATER_EQUAL);^M
+^M
+        if (VerifyVersionInfoW(&osver, VER_BUILDNUMBER, dwlConditionMask) != FALSE)^M
+        {^M
+            return true;^M
+        }^M
+        return false;^M
+    }();^M
+    return isWindows11;^M
+}^M
+^M
 void _buildArgsFromCommandline(std::vector<winrt::hstring>& args)
 {
     if (auto commandline{ GetCommandLineW() })
@@ -97,7 +117,8 @@ bool WindowEmperor::HandleCommandlineArgs()

     const auto result = _manager.ProposeCommandline(eventArgs, isolatedMode);

-    if (result.ShouldCreateWindow())
+    const bool makeWindow = result.ShouldCreateWindow();^M
+    if (makeWindow)^M
     {
         _createNewWindowThread(Remoting::WindowRequestedArgs{ result, eventArgs });

@@ -111,9 +132,15 @@ bool WindowEmperor::HandleCommandlineArgs()
             AppHost::s_DisplayMessageBox(res);
             ExitThread(res.ExitCode);
         }
+^M
+        if (!IsWindows11())^M
+        {^M
+            auto a{ _app };^M
+            winrt::detach_abi(_app);^M
+        }^M
     }

-    return result.ShouldCreateWindow();
+    return makeWindow;^M
 }

@zadjii-msft
Copy link
Member

That horrible diff will "leak" the App on Windows 10. "Leak" in quotes, because the whole process is exiting, so the entire thing is just gonna get released instantly after that when the process is exited.

And you wanna know what? We used to do that intentionally. See #5629.

We actually still do leak it intentionally. Problem is, we only leak it for the "monarch" process, the main one, with the windows. The other peasant process that just immediately fucks off - that one doesn't leak and GUESS WHAT. That crashes. Cause of course it does.

zadjii-msft added a commit that referenced this issue May 24, 2023
@alabuzhev
Copy link
Contributor Author

Wait

So you have a system where random peasants just delegate all the work to the "monarch" and call it a day? 😄

@zadjii-msft
Copy link
Member

So you have a system where random peasants just delegate all the work to the "monarch" and call it a day

SURE DO

Monarch and Peasant Processes

This document assumes the reader is already familiar with the "Monarch and
Peasant" architecture as detailed in the Windows Terminal Process Model 2.0
Spec
. As a quick summary:

  • Every Windows Terminal window is a "Peasant" process.
  • One of the Windows Terminal window processes is also the "Monarch" process.
    The Monarch is picked randomly from the Terminal windows, and there is only
    ever one Monarch process at a time.
  • Peasants can communicate with the monarch when certain state changes (such as
    their window being activated), and the monarch can send commands to any of the
    peasants.

Admittedly, Process Model v2 never shipped, because what we ended up shipping, "process model v3" was way easier to implement, and I never went back to update that doc. But the Monarch/Peasant stuff all stayed.

@microsoft-github-policy-service microsoft-github-policy-service bot added the In-PR This issue has a related PR label May 25, 2023
zadjii-msft added a commit that referenced this issue May 26, 2023
DHowett pushed a commit that referenced this issue May 26, 2023
…5451)

This is a resurrection of #5629. As it so happens, this crash-on-exit
was _not_ specific to my laptop. It's a bug in the XAML platform
somewhere, only on Windows 10.

In #14843, we moved this leak into `becomeMonarch`. Turns out, we don't
just need this leak for the monarch process, but for all of them.

It's not a real "leak", because ultimately, our `App` lives for the
entire lifetime of our process, and then gets cleaned up when we do. But
`dtor`ing the `App` - that's apparently a no-no.

Was originally in #15424, but I'm pulling it out for a super-hotfix
release.


Closes #15410

MSFT:35761869 looks like it was closed as no repro many moons ago. This
should close out our hits there (firmly **40% of the crashes we've
gotten on 1.18**)
DHowett pushed a commit that referenced this issue May 26, 2023
…5451)

This is a resurrection of #5629. As it so happens, this crash-on-exit
was _not_ specific to my laptop. It's a bug in the XAML platform
somewhere, only on Windows 10.

In #14843, we moved this leak into `becomeMonarch`. Turns out, we don't
just need this leak for the monarch process, but for all of them.

It's not a real "leak", because ultimately, our `App` lives for the
entire lifetime of our process, and then gets cleaned up when we do. But
`dtor`ing the `App` - that's apparently a no-no.

Was originally in #15424, but I'm pulling it out for a super-hotfix
release.

Closes #15410

MSFT:35761869 looks like it was closed as no repro many moons ago. This
should close out our hits there (firmly **40% of the crashes we've
gotten on 1.18**)

(cherry picked from commit aa8ed8c)
Service-Card-Id: 89332890
Service-Version: 1.18
zadjii-msft added a commit that referenced this issue Jul 19, 2023
This is my proposed solution to #15384.

Basically, the issue is that we cannot ever close a
`DesktopWindowXamlSource` ("DWXS"). If we do, then any other thread that
tries to access XAML metadata will explode, which happens frequently. A
DWXS is inextricably linked to an HWND. That means we have to not only
reuse DWXS's, but the HWNDs themselves. XAML also isn't agile, so we've
got to keep the `thread` that the DWXS was started on alive as well.

To do this, we're going to introduce the ability to "refrigerate" and
"reheat" window threads.
* A window thread is "**hot**" if it's actively got a window, and is
pumping window messages, and generally, is a normal thing.
* When a window is closed, we need to "**refrigerate**" it's
`WindowThread` and `IslandWindow`. `WindowEmperor` will take care of
tracking the threads that are refrigerated.
* When a new window is requested, the Emperor first try to
"**reheat**"/"**microwave**" a refrigerated thread. When a thread gets
reheated, we'll create a new AppHost (and `TerminalWindow`/`Page`), and
we'll use the _existing_ `IslandWindow` for that instance.

<sub>The metaphor is obviously ridiculous, but _you get it_ so who
cares.</sub>

In this way, we'll keep all the windows we've ever created around in
memory, for later reuse. This means that the leak goes from (~12MB x
number of windows closed) to (~12MB x maximum number of simultaneously
open Terminal windows). It's still not good.

We won't do this on Windows 11, because the bug that is the fundamental
premise of this issue is fixed already in the OS.




I'm not 100% confident in this yet. 
* [x] There's still a d3d leak of some sort on exit in debug builds.
(maybe #15306 related)
* havent seen this in a while. Must have been a issue in an earlier
revision.
* [x] I need to validate more on Windows 11
* [x] **BAD**: Closing the last tab on Windows 11 doesn't close the
window
* [x] **BAD**: Closing a window on Windows 11 doesn't close the window -
it just closes the one tab item and keeps on choochin'
* [x] **BAD**: Close last tab, open new one, attempt to close window -
ALL windows go \*poof\*. Cause of course. No break into post-mortem
either.
* [x] more comments
* [ ] maybe a diagram
* [x] Restoring windows is at the wrong place entirely? I once reopened
the Terminal with two persisted windows, and it created one at 0,0
* [x] Remaining code TODO!s: 0 (?)
* [ ] "warm-reloading" `useTabsInTitlebar` (change while terminal is
running after closing a window, open a new one) REALLY doesn't work.
Obviously restores the last kind of window. Yike.

is all about #15384

closes #15410 along the way. Might fork that fix off.
DHowett pushed a commit that referenced this issue Sep 22, 2023
This is my proposed solution to #15384.

Basically, the issue is that we cannot ever close a
`DesktopWindowXamlSource` ("DWXS"). If we do, then any other thread that
tries to access XAML metadata will explode, which happens frequently. A
DWXS is inextricably linked to an HWND. That means we have to not only
reuse DWXS's, but the HWNDs themselves. XAML also isn't agile, so we've
got to keep the `thread` that the DWXS was started on alive as well.

To do this, we're going to introduce the ability to "refrigerate" and
"reheat" window threads.
* A window thread is "**hot**" if it's actively got a window, and is
pumping window messages, and generally, is a normal thing.
* When a window is closed, we need to "**refrigerate**" it's
`WindowThread` and `IslandWindow`. `WindowEmperor` will take care of
tracking the threads that are refrigerated.
* When a new window is requested, the Emperor first try to
"**reheat**"/"**microwave**" a refrigerated thread. When a thread gets
reheated, we'll create a new AppHost (and `TerminalWindow`/`Page`), and
we'll use the _existing_ `IslandWindow` for that instance.

<sub>The metaphor is obviously ridiculous, but _you get it_ so who
cares.</sub>

In this way, we'll keep all the windows we've ever created around in
memory, for later reuse. This means that the leak goes from (~12MB x
number of windows closed) to (~12MB x maximum number of simultaneously
open Terminal windows). It's still not good.

We won't do this on Windows 11, because the bug that is the fundamental
premise of this issue is fixed already in the OS.

I'm not 100% confident in this yet.
* [x] There's still a d3d leak of some sort on exit in debug builds.
(maybe #15306 related)
* havent seen this in a while. Must have been a issue in an earlier
revision.
* [x] I need to validate more on Windows 11
* [x] **BAD**: Closing the last tab on Windows 11 doesn't close the
window
* [x] **BAD**: Closing a window on Windows 11 doesn't close the window -
it just closes the one tab item and keeps on choochin'
* [x] **BAD**: Close last tab, open new one, attempt to close window -
ALL windows go \*poof\*. Cause of course. No break into post-mortem
either.
* [x] more comments
* [ ] maybe a diagram
* [x] Restoring windows is at the wrong place entirely? I once reopened
the Terminal with two persisted windows, and it created one at 0,0
* [x] Remaining code TODO!s: 0 (?)
* [ ] "warm-reloading" `useTabsInTitlebar` (change while terminal is
running after closing a window, open a new one) REALLY doesn't work.
Obviously restores the last kind of window. Yike.

is all about #15384

closes #15410 along the way. Might fork that fix off.

(cherry picked from commit 7010626)
Service-Card-Id: 89927087
Service-Version: 1.18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
In-PR This issue has a related PR Issue-Bug It either shouldn't be doing this or needs an investigation. Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting
Projects
None yet
3 participants