Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use TerminateProcess to exit early #16575

Merged
merged 7 commits into from
Jan 26, 2024
Merged

Conversation

zadjii-msft
Copy link
Member

@zadjii-msft zadjii-msft commented Jan 19, 2024

Closes MSFT:46744208

BODGY: If the emperor is being dtor'd, it's because we've gone past the
end of main, and released the ref in main. Then we might run into an
edge case where main releases it's ref to the emperor, but one of the
window threads might be in the process of exiting, and still holding a
strong ref to the emperor. In that case, we can actually end up with
the window thread being the last reference, and calling App::Close
on that thread will crash us with a E_WRONG_THREAD.

This fixes the issue calling TerminateProcess explicitly.

How validated: The ES team manually ran the test pass this was
crashing in a hundred times to make sure this actually fixed it.

Co-authored-by: Leonard Hecker lhecker@microsoft.com

// and calling App::Close on that thread will crash us with a E_WRONG_THREAD
//
// For more context, see MSFT:46744208
std::exit(0);
Copy link
Member

@lhecker lhecker Jan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right place to put this. A destructor that exits the process seems bodgy indeed, but it doesn't need to be bodgy if we simply exit outside of the destructor. I'd either put this into main itself or into HandleCommandlineArgs, after calling WaitForWindows() directly after the _becomeMonarch() line. It already has a call to quick_exit in the other branch and now this branch will have an explicit exit. It seems we've found that there aren't any circumstances where we want a clean exit to occur. I'd simply call quick_exit under both circumstances.

Finally, I've found that this trick won't reliably work for unpackaged WT. There, the CRT will call ExitProcess instead of TerminateProcess which has similar issues. Among others, we still run into the deadlock issue with wil::unique_folder_change_reader_nothrow. Personally, I'd call TerminateProcess explicitly on #ifdef NDEBUG and quick_exit otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I had in mind: https://github.com/microsoft/terminal/compare/dev/lhecker/16575-TerminateProcess

The benefits of the larger changeset are that it also works for unpackaged WT, it avoids shutdown issues by not calling WindowThread::ThrowAway on Win10, it fixes the TerminateThread call in AppHost::_HandleCommandlineArgs (unless I misunderstood its intention and it's supposed to be PostQuitMessage?), and it simplifies the WindowEmperor::HandleCommandlineArgs function by unifying the two exit paths (as mentioned in my above comment).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'm cool with that instead of this. Did you test #15410 on Windows 10 to make sure we don't regress it? If that works, then I'm inclined to say that yours is more correct and overall less bodgy (whereas this PR is more bodgy).

Copy link
Member

@lhecker lhecker Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't test it the way the issue describes it specifically but I did run that code on a Windows 10 VM and it seemed fine. Thinking about it, it should work fine too, because WindowEmperor now doesn't get a chance anymore to go out of scope in wWinMain which means that the destructor can't run either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Can y'all work this out so we can cut a 1.18 test build soon? :))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this PR with my branch and you could review it now @DHowett.
The only thing I'm worried about is...

// For more context, see MSFT:46744208
std::exit(0);
_app.Close();
_app = nullptr;
Copy link
Member

@DHowett DHowett Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have tried to remove this App leak two times and every time it reintroduces a crash on shutdown. I really don't think we will succeed a third time.

Are we totally sure about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this will work. The destructor will never run because we terminate before the class leaves the scope.

@DHowett
Copy link
Member

DHowett commented Jan 25, 2024

@lhecker is this PR body up to date?

@DHowett DHowett enabled auto-merge (squash) January 26, 2024 00:03
@lhecker lhecker added the AutoMerge Marked for automatic merge by the bot when requirements are met label Jan 26, 2024
@DHowett DHowett removed the AutoMerge Marked for automatic merge by the bot when requirements are met label Jan 26, 2024
@lhecker lhecker changed the title Manually terminate our process when the emperor is dtor'd Use TerminateProcess to exit early Jan 26, 2024
@DHowett DHowett enabled auto-merge (squash) January 26, 2024 00:08
@DHowett DHowett merged commit 0d47c86 into main Jan 26, 2024
18 of 20 checks passed
@DHowett DHowett deleted the dev/migrie/b/leak-the-emperor-again branch January 26, 2024 00:28
DHowett pushed a commit that referenced this pull request Jan 29, 2024
Closes MSFT:46744208

BODGY: If the emperor is being dtor'd, it's because we've gone past the
end of main, and released the ref in main. Then we might run into an
edge case where main releases it's ref to the emperor, but one of the
window threads might be in the process of exiting, and still holding a
strong ref to the emperor. In that case, we can actually end up with
the _window thread_ being the last reference, and calling App::Close
on that thread will crash us with a E_WRONG_THREAD.

This fixes the issue by calling `TerminateProcess` explicitly.

How validated: The ES team manually ran the test pass this was
crashing in a hundred times to make sure this actually fixed it.

Co-authored-by: Leonard Hecker <lhecker@microsoft.com>
(cherry picked from commit 0d47c86)
Service-Card-Id: 91642488
Service-Version: 1.18
DHowett pushed a commit that referenced this pull request Jan 29, 2024
Closes MSFT:46744208

BODGY: If the emperor is being dtor'd, it's because we've gone past the
end of main, and released the ref in main. Then we might run into an
edge case where main releases it's ref to the emperor, but one of the
window threads might be in the process of exiting, and still holding a
strong ref to the emperor. In that case, we can actually end up with
the _window thread_ being the last reference, and calling App::Close
on that thread will crash us with a E_WRONG_THREAD.

This fixes the issue by calling `TerminateProcess` explicitly.

How validated: The ES team manually ran the test pass this was
crashing in a hundred times to make sure this actually fixed it.

Co-authored-by: Leonard Hecker <lhecker@microsoft.com>
(cherry picked from commit 0d47c86)
Service-Card-Id: 91642489
Service-Version: 1.19
DHowett pushed a commit that referenced this pull request Jan 29, 2024
This changeset ensures that the message queue of frozen windows is
always being serviced. This should ensure that it won't fill up and
lead to deadlocks, freezes, or similar. I've tried _a lot_ of different
approaches before settling on this one. Introducing a custom `WM_APP`
message has the benefit of being the least intrusive to the existing
code base.

The approach that I would have favored the most would be to never
destroy the `AppHost` instance in the first place, as I imagined that
this would be more robust in general and resolve other (rare) bugs.
However, I found that this requires rewriting some substantial parts
of the code base around `AppHost` and it could be something that may
be of interest in the future.

Closes #16332
Depends on #16587 and #16575
DHowett pushed a commit that referenced this pull request Jan 29, 2024
This changeset ensures that the message queue of frozen windows is
always being serviced. This should ensure that it won't fill up and
lead to deadlocks, freezes, or similar. I've tried _a lot_ of different
approaches before settling on this one. Introducing a custom `WM_APP`
message has the benefit of being the least intrusive to the existing
code base.

The approach that I would have favored the most would be to never
destroy the `AppHost` instance in the first place, as I imagined that
this would be more robust in general and resolve other (rare) bugs.
However, I found that this requires rewriting some substantial parts
of the code base around `AppHost` and it could be something that may
be of interest in the future.

Closes #16332
Depends on #16587 and #16575

(cherry picked from commit 5d2fa47)
Service-Card-Id: 91642478
Service-Version: 1.18
DHowett pushed a commit that referenced this pull request Jan 29, 2024
This changeset ensures that the message queue of frozen windows is
always being serviced. This should ensure that it won't fill up and
lead to deadlocks, freezes, or similar. I've tried _a lot_ of different
approaches before settling on this one. Introducing a custom `WM_APP`
message has the benefit of being the least intrusive to the existing
code base.

The approach that I would have favored the most would be to never
destroy the `AppHost` instance in the first place, as I imagined that
this would be more robust in general and resolve other (rare) bugs.
However, I found that this requires rewriting some substantial parts
of the code base around `AppHost` and it could be something that may
be of interest in the future.

Closes #16332
Depends on #16587 and #16575

(cherry picked from commit 5d2fa47)
Service-Card-Id: 91642479
Service-Version: 1.19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

3 participants