Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some internal crash reports #12624

Closed
zadjii-msft opened this issue Mar 4, 2022 · 3 comments · Fixed by #12666
Closed

Some internal crash reports #12624

zadjii-msft opened this issue Mar 4, 2022 · 3 comments · Fixed by #12666
Assignees
Labels
Area-Remoting Communication layer between windows. Often for windowing behavior, quake mode, etc. Issue-Bug It either shouldn't be doing this or needs an investigation. Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Priority-0 Bugs that we consider release-blocking/recall-class (P0) Product-Terminal The new Windows Terminal. Resolution-Fix-Committed Fix is checked in, but it might be 3-4 weeks until a release. zInbox-Bug Ignore me! zPreview-Service-Queued-1.13 A floating label that tracks the current Preview version for servicing purposes. zStable-Service-Queued-1.12 A floating label that tracks the current Stable version for servicing purposes.

Comments

@zadjii-msft
Copy link
Member

zadjii-msft commented Mar 4, 2022

Just filing notes here for posterity. Issue will be updated as I investigate.

MSFT:35731327 - Closed no repro, but definitely still reproing.

Stack
0:000> k
 # Child-SP          RetAddr               Call Site
00 00000039`af0fb268 00007ffd`96847aa9     ntdll!ZwWaitForMultipleObjects+0x14 [minkernel\ntdll\daytona\objfre\amd64\usrstubs.asm @ 947] 
01 00000039`af0fb270 00007ffd`968479ae     KERNELBASE!WaitForMultipleObjectsEx+0xe9 [minkernel\kernelbase\synch.c @ 1551] 
02 00000039`af0fb550 00007ffd`973d27af     KERNELBASE!WaitForMultipleObjects+0xe [minkernel\kernelbase\synch.c @ 1403] 
03 00000039`af0fb590 00007ffd`973d21ee     kernel32!WerpReportFaultInternal+0x587 [onecore\windows\feedback\core\faultrep\lib\faultrep.cpp @ 987] 
04 00000039`af0fb6b0 00007ffd`9694541b     kernel32!WerpReportFault+0xbe [onecore\windows\feedback\core\faultrep\lib\faultrep.cpp @ 1298] 
05 00000039`af0fb6f0 00007ffd`9917bec5     KERNELBASE!UnhandledExceptionFilter+0x3db [minkernel\kernelbase\xcpt.c @ 685] 
06 (Inline Function) --------`--------     ntdll!RtlpThreadExceptionFilter+0xa3 [minkernel\ntdll\rtlstrt.c @ 960] 
07 00000039`af0fb810 00007ffd`99163177     ntdll!RtlUserThreadStart$filt$0+0xb4 [minkernel\ntdll\rtlstrt.c @ 1167] 
08 00000039`af0fb850 00007ffd`99178a0f     ntdll!__C_specific_handler+0x97 [minkernel\crts\crtw32\misc\riscchandler.c @ 433] 
09 00000039`af0fb8c0 00007ffd`990e3de6     ntdll!RtlpExecuteHandlerForException+0xf [minkernel\ntos\rtl\amd64\xcptmisc.asm @ 132] 
0a 00000039`af0fb8f0 00007ffd`991779fe     ntdll!RtlDispatchException+0x286 [minkernel\ntos\rtl\amd64\exdsptch.c @ 655] 
0b 00000039`af0fc040 00007ffd`83c41a10     ntdll!KiUserExceptionDispatch+0x2e [minkernel\ntos\rtl\amd64\trampoln.asm @ 755] 
0c (Inline Function) --------`--------     Microsoft_Terminal_Remoting!winrt::impl::consume_Microsoft_Terminal_Remoting_IWindowActivatedArgs<winrt::Microsoft::Terminal::Remoting::IWindowActivatedArgs>::PeasantID+0xb [C:\a\_work\1\s\src\cascadia\Remoting\Generated Files\winrt\Microsoft.Terminal.Remoting.h @ 616] 
0d (Inline Function) --------`--------     Microsoft_Terminal_Remoting!winrt::Microsoft::Terminal::Remoting::implementation::WindowActivatedArgs::{ctor}+0x2a [C:\a\_work\1\s\src\cascadia\Remoting\WindowActivatedArgs.h @ 51] 
0e (Inline Function) --------`--------     Microsoft_Terminal_Remoting!winrt::make_self+0x46 [C:\a\_work\1\s\src\cascadia\Remoting\Generated Files\winrt\base.h @ 7787] 
0f 00000039`af0fc7e0 00007ffd`83c5d49f     Microsoft_Terminal_Remoting!winrt::Microsoft::Terminal::Remoting::implementation::Monarch::HandleActivatePeasant+0x80 [C:\a\_work\1\s\src\cascadia\Remoting\Monarch.cpp @ 362] 
10 00000039`af0fc990 00007ffd`83c4d567     Microsoft_Terminal_Remoting!winrt::impl::produce<winrt::Microsoft::Terminal::Remoting::implementation::Monarch,winrt::Microsoft::Terminal::Remoting::IMonarch>::HandleActivatePeasant+0x2f [C:\a\_work\1\s\src\cascadia\Remoting\Generated Files\winrt\Microsoft.Terminal.Remoting.h @ 985] 
11 (Inline Function) --------`--------     Microsoft_Terminal_Remoting!winrt::impl::consume_Microsoft_Terminal_Remoting_IMonarch<winrt::Microsoft::Terminal::Remoting::IMonarch>::HandleActivatePeasant+0x13 [C:\a\_work\1\s\src\cascadia\Remoting\Generated Files\winrt\Microsoft.Terminal.Remoting.h @ 109] 
12 00000039`af0fc9d0 00007ffd`83c4b4b9     Microsoft_Terminal_Remoting!winrt::Microsoft::Terminal::Remoting::implementation::WindowManager::_createMonarchAndCallbacks+0xa747 [C:\a\_work\1\s\src\cascadia\Remoting\WindowManager.cpp @ 256] 
13 00000039`af0fcb50 00007ffd`914f1080     Microsoft_Terminal_Remoting!`winrt::Microsoft::Terminal::Remoting::implementation::WindowManager::_createOurPeasant'::`1'::catch$174+0x19 [C:\a\_work\1\s\src\cascadia\Remoting\WindowManager.cpp @ 313] 
14 00000039`af0fcb90 00007ffd`914f26a5     VCRUNTIME140_1!_CallSettingFrame_LookupContinuationIndex+0x20 [d:\a01\_work\12\s\src\vctools\crt\vcruntime\src\eh\amd64\handlers.asm @ 98] 
15 00000039`af0fcbc0 00007ffd`99178256     VCRUNTIME140_1!__FrameHandler4::CxxCallCatchBlock+0x115 [d:\a01\_work\12\s\src\vctools\crt\vcruntime\src\eh\frame.cpp @ 1393] 
16 00000039`af0fcca0 00007ffd`83c4d397     ntdll!RcFrameConsolidation+0x6 [minkernel\ntos\rtl\amd64\capture.asm @ 1147] 
17 (Inline Function) --------`--------     Microsoft_Terminal_Remoting!winrt::check_hresult+0xa846 [C:\a\_work\1\s\src\cascadia\Remoting\Generated Files\winrt\base.h @ 5016] 
18 (Inline Function) --------`--------     Microsoft_Terminal_Remoting!winrt::impl::consume_Microsoft_Terminal_Remoting_IMonarch<winrt::Microsoft::Terminal::Remoting::IMonarch>::AddPeasant+0xa86d [C:\a\_work\1\s\src\cascadia\Remoting\Generated Files\winrt\Microsoft.Terminal.Remoting.h @ 92] 
19 00000039`af0fef60 00007ffd`83c42553     Microsoft_Terminal_Remoting!winrt::Microsoft::Terminal::Remoting::implementation::WindowManager::_createOurPeasant+0xaaa7 [C:\a\_work\1\s\src\cascadia\Remoting\WindowManager.cpp @ 305] 
1a 00000039`af0ff0b0 00007ffd`83c4113f     Microsoft_Terminal_Remoting!winrt::Microsoft::Terminal::Remoting::implementation::WindowManager::ProposeCommandline+0x1d3 [C:\a\_work\1\s\src\cascadia\Remoting\WindowManager.cpp @ 190] 
1b 00000039`af0ff1b0 00007ff7`9b592cee     Microsoft_Terminal_Remoting!winrt::impl::produce<winrt::Microsoft::Terminal::Remoting::implementation::WindowManager,winrt::Microsoft::Terminal::Remoting::IWindowManager>::ProposeCommandline+0x2f [C:\a\_work\1\s\src\cascadia\Remoting\Generated Files\winrt\Microsoft.Terminal.Remoting.h @ 1703] 
1c (Inline Function) --------`--------     WindowsTerminal!winrt::impl::consume_Microsoft_Terminal_Remoting_IWindowManager<winrt::Microsoft::Terminal::Remoting::IWindowManager>::ProposeCommandline+0x19 [C:\a\_work\1\s\src\cascadia\WindowsTerminal\Generated Files\winrt\Microsoft.Terminal.Remoting.h @ 651] 
1d 00000039`af0ff1f0 00007ff7`9b594c4c     WindowsTerminal!AppHost::_HandleCommandlineArgs+0x1de [C:\a\_work\1\s\src\cascadia\WindowsTerminal\AppHost.cpp @ 184] 
1e 00000039`af0ff4d0 00007ff7`9b591280     WindowsTerminal!AppHost::AppHost+0x1dc [C:\a\_work\1\s\src\cascadia\WindowsTerminal\AppHost.cpp @ 50] 
1f 00000039`af0ff7d0 00007ff7`9b59ca72     WindowsTerminal!wWinMain+0x110

Pulled two dumps there -

  • they're defterm invokes (-Embeddding)
  • they are the king (different from the other bug)

ah that one's tricky, there aren't tests at exactly that level. The defterm is getting created just as the king window is closing. So during proposeCommandline it decides it needs to also instantiate a new Monarch itself, and then tell the Monarch about itself (including when we were laste activated). however, we're still in the process of setting up the window. The window hasn't been activated yet, so the Peasant doesn't have a lastactivatedargs.

I woulda thought this try/catch would have got it though:

        try
        {
            localArgs = winrt::make_self<implementation::WindowActivatedArgs>(args);
            // This method will actually do the hard work
            _doHandleActivatePeasant(localArgs);
        }
        catch (...)
        {
            TraceLoggingWrite(g_hRemotingProvider,
                              "Monarch_HandleActivatePeasant_Failed",
                              TraceLoggingLevel(WINEVENT_LEVEL_VERBOSE),
                              TraceLoggingKeyword(TIL_KEYWORD_TRACE));
        }

SOLUTION: Should be easy enough to add a null check to Monarch::HandleActivatePeasant (and anywhere else there's a WindowActivatedArgs really)

🔥 MSFT:32518679 (ARM version), MSFT:32279047 (AMD64 version)

Happening as early as module versions 1.11.[2108, 2109, 2110], and all the way till 1.13.2202.8005

Hmm. the _windowManager isn't the king. the _monarch isn't null, so it was created, and in another process. We're not a peasant yet so it makes sense there's no election thread or _peasant yet.

given that _shouldCreateWindow is false, either:

  • we crashed in const auto result = _monarch.ProposeCommandline(args);
  • we made it past that, and crashed... where? When we call out to the monarch process? I'd imagine that would have given us a RPC error of some sort if that itself failed, so maybe the Monarch process had the stack overflow? but in that case I'd imagine we'd be getting hits for FindTargetWindow in an actual monarch process.

We may be able to repro this by launching the Terminal then launching cmd.exe, to force defterm to engage....

A thought: _monarch.ProposeCommandline could fail. That's a potentially cross-proc hop there, and it's not try/caught. So in that scenario:

  • the user opened a window[A] that became the monarch
  • they ran cmd.exe which started a defterm handoff
  • that gets into WindowManager::ProposeCommandline, which starts to ask the monarch via ProposeCommandline
  • On that line, the monarch dies. That would raise an exception - would that be the c0000409/STATUS_STACK_BUFFER_OVERRUN we're seeing? because that seems... unlikely.

There's piles of tracing around here. We should be able to get traces from these crashes, right? right?

After more investigating:

            // TODO! If the monarch dies or otherwise fails, what do we do here?
            //
            // I suppose we could emulate this by sticking a breakpoint in
            // TerminalApp!winrt::TerminalApp::implementation::AppLogic::_doFindTargetWindow,
            // starting a terminal, starting a defterm, and when that BP gets
            // hit, kill the original one and see what happens here.
            const auto result = _monarch.ProposeCommandline(args);
            // Damn, killing the monarch exactly here still gets us -2147023170,
            // RPC_S_CALL_FAILED, which is exactly what you'd expect

So it's not that the monarch died. Something caused a buffer overrun and I have no idea what because the stack is just no longer useful, and try/catching that won't work either. We'll need to come up with something next week.

@zadjii-msft zadjii-msft added Issue-Bug It either shouldn't be doing this or needs an investigation. Product-Terminal The new Windows Terminal. Priority-0 Bugs that we consider release-blocking/recall-class (P0) Area-Remoting Communication layer between windows. Often for windowing behavior, quake mode, etc. labels Mar 4, 2022
@zadjii-msft zadjii-msft added this to the Terminal v1.14 milestone Mar 4, 2022
@zadjii-msft zadjii-msft self-assigned this Mar 4, 2022
@ghost ghost added the Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting label Mar 4, 2022
@zadjii-msft zadjii-msft added zPreview-Service-Queued-1.13 A floating label that tracks the current Preview version for servicing purposes. zStable-Service-Queued-1.12 A floating label that tracks the current Stable version for servicing purposes. labels Mar 8, 2022
@zadjii-msft
Copy link
Member Author

Randomly opened a fresh dump this morning, by chance. That one actually had a result value in the check_hresult above consume_Microsoft_Terminal_Remoting_IWindowManager<D>::ProposeCommandline. That was a RPC_E_SERVER_UNAVAILABLE, which is exactly what I would have expected to see if the monarch had died at exactly this moment. THAT we can test!

#7763 (comment) may also be useful for debugging this sort of thing.

@zadjii-msft zadjii-msft added the zInbox-Bug Ignore me! label Mar 10, 2022
@ghost ghost added the In-PR This issue has a related PR label Mar 10, 2022
@ghost ghost removed the In-PR This issue has a related PR label Mar 11, 2022
DHowett pushed a commit that referenced this issue Mar 11, 2022
This fixes a pair of inbox bugs, hopefully.

* MSFT:35731327
  * There's a small window where a peasant is being created when a monarch is exiting. When that happens, the new peasant will try to tell itself (the new monarch) when the peasant was last activated, but because the window hasn't actually finished instantiating, the peasant doesn't yet have a LastActivatedArgs to tell the monarch about.
* MSFT:32518679 (ARM version) / MSFT:32279047 (AMD64 version)
  * This one's tricky. Not totally sure this is the fix, bug assuming my hypothesis is correct, this should fix it. Regardless, this does fix a bug that was in the code.
  * If the king dies right as another window is starting, right while the new window is starting to ProposeCommandline to the monarch, the monarch could die. If it does, the new window just explodes too. Not what you want. 


Vaguely tested the second bug manually, by setting breakpoints in the monarch, starting a defterm, then exiting the monarch while the handoff was in process. That now creates a new window, so that's at least something. `RemotingTests::TestProposeCommandlineWithDeadMonarch` was the closest I could get to testing that. 

The first bug only got an eye check. Not sure how to repro, but I figured yeet and hopefully we get it. 

* [x] Closes #12624
@ghost ghost added the Resolution-Fix-Committed Fix is checked in, but it might be 3-4 weeks until a release. label Mar 11, 2022
DHowett pushed a commit that referenced this issue Mar 11, 2022
This fixes a pair of inbox bugs, hopefully.

* MSFT:35731327
  * There's a small window where a peasant is being created when a monarch is exiting. When that happens, the new peasant will try to tell itself (the new monarch) when the peasant was last activated, but because the window hasn't actually finished instantiating, the peasant doesn't yet have a LastActivatedArgs to tell the monarch about.
* MSFT:32518679 (ARM version) / MSFT:32279047 (AMD64 version)
  * This one's tricky. Not totally sure this is the fix, bug assuming my hypothesis is correct, this should fix it. Regardless, this does fix a bug that was in the code.
  * If the king dies right as another window is starting, right while the new window is starting to ProposeCommandline to the monarch, the monarch could die. If it does, the new window just explodes too. Not what you want.

Vaguely tested the second bug manually, by setting breakpoints in the monarch, starting a defterm, then exiting the monarch while the handoff was in process. That now creates a new window, so that's at least something. `RemotingTests::TestProposeCommandlineWithDeadMonarch` was the closest I could get to testing that.

The first bug only got an eye check. Not sure how to repro, but I figured yeet and hopefully we get it.

* [x] Closes #12624

(cherry picked from commit f507d9f)
DHowett pushed a commit that referenced this issue Mar 11, 2022
This fixes a pair of inbox bugs, hopefully.

* MSFT:35731327
  * There's a small window where a peasant is being created when a monarch is exiting. When that happens, the new peasant will try to tell itself (the new monarch) when the peasant was last activated, but because the window hasn't actually finished instantiating, the peasant doesn't yet have a LastActivatedArgs to tell the monarch about.
* MSFT:32518679 (ARM version) / MSFT:32279047 (AMD64 version)
  * This one's tricky. Not totally sure this is the fix, bug assuming my hypothesis is correct, this should fix it. Regardless, this does fix a bug that was in the code.
  * If the king dies right as another window is starting, right while the new window is starting to ProposeCommandline to the monarch, the monarch could die. If it does, the new window just explodes too. Not what you want.

Vaguely tested the second bug manually, by setting breakpoints in the monarch, starting a defterm, then exiting the monarch while the handoff was in process. That now creates a new window, so that's at least something. `RemotingTests::TestProposeCommandlineWithDeadMonarch` was the closest I could get to testing that.

The first bug only got an eye check. Not sure how to repro, but I figured yeet and hopefully we get it.

* [x] Closes #12624

(cherry picked from commit f507d9f)
@ghost
Copy link

ghost commented Mar 25, 2022

🎉This issue was addressed in #12666, which has now been successfully released as Windows Terminal v1.12.1073.:tada:

Handy links:

@ghost
Copy link

ghost commented Mar 25, 2022

🎉This issue was addressed in #12666, which has now been successfully released as Windows Terminal Preview v1.13.1073.:tada:

Handy links:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Remoting Communication layer between windows. Often for windowing behavior, quake mode, etc. Issue-Bug It either shouldn't be doing this or needs an investigation. Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Priority-0 Bugs that we consider release-blocking/recall-class (P0) Product-Terminal The new Windows Terminal. Resolution-Fix-Committed Fix is checked in, but it might be 3-4 weeks until a release. zInbox-Bug Ignore me! zPreview-Service-Queued-1.13 A floating label that tracks the current Preview version for servicing purposes. zStable-Service-Queued-1.12 A floating label that tracks the current Stable version for servicing purposes.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant