Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An attempted fix for the SignalTextChanged crash #13876

Merged
3 commits merged into from
Sep 2, 2022
Merged

Conversation

zadjii-msft
Copy link
Member

This is conjecture - I was totally unable to repro the original crash here.
Based on the stacks in MSFT:39994969, it looks like we try to fire off a
RaiseAutomationEvent, which calls through UIA core, eventually to the point of
calling ComPtr<WUX::Automation::Peers::IAutomationPeer>::{dtor}. I'm guessing
based on the stacks that the TermControl has already been released and cleaned
up. However, the lambda in the RunAsync calls here only takes a ref to the
TCAP. The TCAP has an outstanding reference (maybe on the other side of the UIA
fence), and gets successfully resolved as strong, but when calling to
RaiseAutomationEvent, the owner we passed in is gonezo.

This explicitly passes a weak_ref to TermControlAutomationPeer, rather than
a raw ptr, so we can actually check if the control is still alive before we
dereference it. If it is, great, we've got a strong ref to it now and it won't
get torn down.

Again, this is hearsay. Without a repro, the only way we can confirm this is
gone is by just hoping the crashes go away. 🤞

This is conjecture - I was totally unable to repro the original crash here.
Based on the stacks in MSFT:39994969, it looks like we try to fire off a
`RaiseAutomationEvent`, which calls through UIA core, eventually to the point of
calling `ComPtr<WUX::Automation::Peers::IAutomationPeer>::{dtor}`. I'm guessing
based on the stacks that the TermControl has already been released and cleaned
up. However, the lambda in the `RunAsync` calls here only takes a ref to the
TCAP. The TCAP has an outstanding reference (maybe on the other side of the UIA
fence), and gets successfully resolved as strong, but when calling to
`RaiseAutomationEvent`, the `owner` we passed in is gonezo.

This explicitly passes a `weak_ref` to `TermControlAutomationPeer`, rather than
a raw ptr, so we can actually check if the control is still alive before _we_
dereference it. If it is, great, we've got a strong ref to it now and it won't
get torn down.

Again, this is hearsay. Without a repro, the only way we can confirm this is
gone is by just hoping the crashes go away. 🤞

* Might close #13357 (we'll reopen if it doesn't?)
* narrator still works
@ghost ghost added Area-TerminalControl Issues pertaining to the terminal control (input, selection, keybindings, mouse interaction, etc.) Issue-Bug It either shouldn't be doing this or needs an investigation. Priority-1 A description (P1) Product-Terminal The new Windows Terminal. Severity-Crash Crashes are real bad news. labels Aug 29, 2022
@@ -134,8 +134,11 @@ namespace winrt::Microsoft::Terminal::Control::implementation
dispatcher.RunAsync(Windows::UI::Core::CoreDispatcherPriority::Normal, [weakThis{ get_weak() }]() {
if (auto strongThis{ weakThis.get() })
{
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're gonna want: whitespace off

src/cascadia/TerminalControl/TermControlAutomationPeer.cpp Outdated Show resolved Hide resolved
@@ -134,8 +134,11 @@ namespace winrt::Microsoft::Terminal::Control::implementation
dispatcher.RunAsync(Windows::UI::Core::CoreDispatcherPriority::Normal, [weakThis{ get_weak() }]() {
if (auto strongThis{ weakThis.get() })
{
// The event that is raised when the text selection is modified.
strongThis->RaiseAutomationEvent(AutomationEvents::TextPatternOnTextSelectionChanged);
if (auto control{ strongThis->_termControl.get() })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh from the PR description alone I haven't understood why we need to hold a strong reference to control across the RaiseAutomationEvent call. What if RaiseAutomationEvent is doing anything asynchronously? In that case holding control wouldn't fix anything... 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I'm not entirely sure this'll fix it. Without a solid repro, it was hard for me to figure out how exactly we were getting into this place where a com_ptr was being released, and what exactly the pointer was supposed to be pointing at. This is definitely a "maybe it works, if it doesn't, let's revert and try something else" kinda crash fix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless, passing in a raw pointer to an impl type has almost certainly got to be wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW just to be clear, these if conditions are the only thing that I consider to be whacky, because I don't see how this fixes the issue and if it does fix the issue, then I feel like it'd be one of these "it just so happens to work" ones. 😔
Should we maybe first try it with only the raw pointer -> weak pointer change?

Copy link
Member

@carlos-zamora carlos-zamora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally (at least for NVDA), I think the process is that we send an event, then NVDA calls DocumentRange() or GetSelection() to figure out what changed and cause NVDA to speak. Since the destruction process is the wild west, might be worth adding some THROW_IF_NULL(_uiaProvider) or something like that throughout?

Other than that, I think this is worth a shot. It doesn't hurt to check if we're still pointing to a TermControl that still exists. Could be that TCAP is lingering around when it should've dtor'd. idk

src/cascadia/TerminalControl/TermControl.cpp Outdated Show resolved Hide resolved
src/cascadia/TerminalControl/TermControlAutomationPeer.cpp Outdated Show resolved Hide resolved
src/cascadia/TerminalControl/TermControlAutomationPeer.cpp Outdated Show resolved Hide resolved
src/cascadia/TerminalControl/TermControlAutomationPeer.cpp Outdated Show resolved Hide resolved
@DHowett
Copy link
Member

DHowett commented Aug 30, 2022

Since the destruction process is the wild west, might be worth adding some THROW_IF_NULL(_uiaProvider) or something like that throughout?

Won't that turn destruction "whoopsies" into guaranteed crashes of the entire Terminal and all of the tabs it is hosting? Is there a safer way to do this?

@DHowett
Copy link
Member

DHowett commented Aug 31, 2022

I'd prefer @lhecker be the second on this if he's comfortable with it.

@ghost ghost added the Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something label Aug 31, 2022
Co-authored-by: Carlos Zamora <carlos.zamora@microsoft.com>
@ghost ghost removed the Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something label Aug 31, 2022
Copy link
Member

@lhecker lhecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really in favor of this PR. There shouldn't be situation to begin with were NVDA calls us and the existence of TermControl is the make/break of whether WT crashes... As such I'd prefer fixing whatever UIA APIs we have that might crash WT, including any assumptions we have that WinUI won't suddenly call AppHost::~AppHost despite being 50 calls deep in the stack.

const Core::Padding padding,
Control::InteractivityAutomationPeer impl) :
TermControlAutomationPeerT<TermControlAutomationPeer>(*owner), // pass owner to FrameworkElementAutomationPeer
TermControlAutomationPeerT<TermControlAutomationPeer>(*owner.get()), // pass owner to FrameworkElementAutomationPeer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*owner.get() creates a copy of the underlying TermControl class, doesn't it? How does this work? 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If owner is a com_ptr<ProjectedType>, it probably just copies the owning pointer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

winrt::com_ptr<T>::get returns T* according to https://docs.microsoft.com/en-us/uwp/cpp-ref-for-winrt/com-ptr#com_ptrget-function
*owner.get() should thus be of type T, aka TermControl. 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, but, this is a silly thing. So.

It returns T*, and * returns T&. winrt::implements (which TermControl is eventually derived from) has an implicit conversion to all projected types that it inherits. This is an implicit conversion to, I would guess, Xaml.UIElement as a projected type. It captures a strong pointer through a QueryInterface!

@DHowett
Copy link
Member

DHowett commented Aug 31, 2022

I'm not really in favor of this PR.

You don't need to sign off on something you're not in favor of! Should we file a workitem to fix this more soundly? Should we take this as a "stopgap" until we can fix it more soundly?

@lhecker
Copy link
Member

lhecker commented Aug 31, 2022

You don't need to sign off on something you're not in favor of! Should we file a workitem to fix this more soundly? Should we take this as a "stopgap" until we can fix it more soundly?

Maybe I was too terse. 😄 I'm in favor of merging this, just not in favor of this PR because it's a stopgap. I think we should just continue to work on cleaning up our pointer safety and data sharing behavior. Maybe even do it consciously at some point instead of doing it on a case by case basis.
IMO we should remove any use of this (including implicit use) if it can be avoided and find and if possible remove implicit recursion like this one were class X calls Y which calls back into X (e.g. introduce class Z which is used by both X and Y, removing the need for recursion).

@DHowett
Copy link
Member

DHowett commented Aug 31, 2022

Maybe even do it consciously at some point instead of doing it on a case by case basis.

YES

@zadjii-msft
Copy link
Member Author

Noted conversation, follow-up in #13896

@zadjii-msft zadjii-msft added the AutoMerge Marked for automatic merge by the bot when requirements are met label Sep 2, 2022
@ghost
Copy link

ghost commented Sep 2, 2022

Hello @zadjii-msft!

Because this pull request has the AutoMerge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

@ghost ghost merged commit fe0d570 into main Sep 2, 2022
@ghost ghost deleted the dev/migrie/b/13357-attempt-0 branch September 2, 2022 10:52
DHowett pushed a commit that referenced this pull request Sep 6, 2022
This is conjecture - I was totally unable to repro the original crash here.
Based on the stacks in MSFT:39994969, it looks like we try to fire off a
`RaiseAutomationEvent`, which calls through UIA core, eventually to the point of
calling `ComPtr<WUX::Automation::Peers::IAutomationPeer>::{dtor}`. I'm guessing
based on the stacks that the TermControl has already been released and cleaned
up. However, the lambda in the `RunAsync` calls here only takes a ref to the
TCAP. The TCAP has an outstanding reference (maybe on the other side of the UIA
fence), and gets successfully resolved as strong, but when calling to
`RaiseAutomationEvent`, the `owner` we passed in is gonezo.

This explicitly passes a `weak_ref` to `TermControlAutomationPeer`, rather than
a raw ptr, so we can actually check if the control is still alive before _we_
dereference it. If it is, great, we've got a strong ref to it now and it won't
get torn down.

Again, this is hearsay. Without a repro, the only way we can confirm this is
gone is by just hoping the crashes go away. 🤞

* Might close #13357 (we'll reopen if it doesn't?)
* narrator still works

(cherry picked from commit fe0d570)
Service-Card-Id: 85401131
Service-Version: 1.15
@ghost
Copy link

ghost commented Sep 13, 2022

🎉Windows Terminal v1.15.252 has been released which incorporates this pull request.:tada:

Handy links:

@ghost
Copy link

ghost commented Sep 13, 2022

🎉Windows Terminal Preview v1.16.252 has been released which incorporates this pull request.:tada:

Handy links:

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-TerminalControl Issues pertaining to the terminal control (input, selection, keybindings, mouse interaction, etc.) AutoMerge Marked for automatic merge by the bot when requirements are met Issue-Bug It either shouldn't be doing this or needs an investigation. Priority-1 A description (P1) Product-Terminal The new Windows Terminal. Severity-Crash Crashes are real bad news.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash while holding a key
4 participants