Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use FLS detach as thread termination notification on windows. #110589

Merged
merged 6 commits into from
Dec 11, 2024

Conversation

VSadov
Copy link
Member

@VSadov VSadov commented Dec 10, 2024

Fixes: #110350

Switches the mechanism for OS notification about thread termination to use FLS (Fiber Local Storage) - the same mechanism as used on NativeAOT.

The advantage of FLS notification is that it does not run while holding the Loader Lock, thus it does not require that the termination callback never calls or waits for an OS call that may need to acquire that same Loader Lock. (It is hard for us to guarantee the latter)

if (threadFromCurrentFiber != NULL)
{
_ASSERTE(!"Multiple threads encountered from a single fiber");
COMPlusThrowWin32();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fail fast or ASSERTE_ALL_BUILDS. I do think we want to run any exception handling here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ASSERTE_ALL_BUILDS would be the right choice if it did what the name implies, but is it just a regular assert?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it looks like ASSERTE_ALL_BUILDS is a failfast in CoreClr. It is in the NativeAOT where it is just a regular debug-only assert.

Copy link
Member Author

@VSadov VSadov Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw the pattern used in threads.cpp. So I figured it might be the convention for this kind of code.

if (s_barrierCopy == NULL)
{
_ASSERTE(!"Allocation of GC barrier code page failed");
COMPlusThrowWin32();
}

if (g_debuggerWordTLSIndex == TLS_OUT_OF_INDEXES)
COMPlusThrowWin32();

. . . bunch of other cases . . .

I do agree that _ASSERTE_ALL_BUILDS intuitively seems more appropriate.

Copy link
Member

@jkotas jkotas Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a convention to throw an exception when there is a (transient) error during runtime initialization.

This case is different - it is not runtime initialization: It means that something is really wrong, e.g. the process uses fibers that .NET runtime does not support.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. In this case these are situations that we cannot recover from, so should be a failfast or an equivalent.

src/coreclr/vm/threads.cpp Outdated Show resolved Hide resolved
@VSadov
Copy link
Member Author

VSadov commented Dec 11, 2024

I suspect JSImportGenerator test failures are not related to this change as the failures happen equally on windows and on linux/osx, while this is a windows-specific change.

// thread - thread to detach
// Return:
// true if the thread was detached, false if there was no attached thread
bool OsDetachThread(void* thread)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We never check the return value. Should it return void?

Copy link
Member Author

@VSadov VSadov Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This came from NativeAOT. I just mostly wanted to keep code similar.

In NativeAOT returning false indicates that the thread was initialized (i.e the Thread object that is a C++ TLS object has meaningful data and it is m_ThreadStateFlags != TSF_Unknown), but is not attached (i.e. not inserted in the thread store - basically not a managed thread yet). Returning false here means that the rest of detachment steps like removing from threadstore are unnecessary.

I am not sure the state is real. Once we initialize the thread we immediately try to attach it to the threadstore, but these are distinct states that we can handle in detach.

On CoreCLR the call to SetThread(NULL) seems to be more ad-hoc. It is called from 5 different places. It seems more like "just unsubscribe from OS notifications". Noone checks the result.
SetThread(NULL) itself would clear appdomain regardless of whether the thread was attached to the runtime or not.

Yes, I think we could return void. We evidently do not check the result.

src/coreclr/vm/ceemain.cpp Outdated Show resolved Hide resolved
Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@VSadov
Copy link
Member Author

VSadov commented Dec 11, 2024

The win-x86 failure appears to be #110285

@VSadov VSadov merged commit 69d8076 into dotnet:main Dec 11, 2024
88 of 90 checks passed
@VSadov VSadov deleted the fix110350 branch December 11, 2024 20:45
@VSadov
Copy link
Member Author

VSadov commented Dec 11, 2024

Thanks!

@VSadov
Copy link
Member Author

VSadov commented Dec 11, 2024

/backport to release/9.0-staging

Copy link
Contributor

Started backporting to release/9.0-staging: https://github.com/dotnet/runtime/actions/runs/12285157933

hez2010 pushed a commit to hez2010/runtime that referenced this pull request Dec 14, 2024
…#110589)

* Use FLS detach as thread termination notification on windows.

* use _ASSERTE_ALL_BUILDS

* one more case

* InitFlsSlot throws per convention used in threading initialization

* OsDetachThread could be void

* Update src/coreclr/vm/ceemain.cpp

---------

Co-authored-by: Jan Kotas <jkotas@microsoft.com>
@jkotas
Copy link
Member

jkotas commented Dec 18, 2024

I am sorry - I have to revert this to unblock WinForms codeflow (#110801).

@VSadov VSadov restored the fix110350 branch January 7, 2025 00:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Process not responsive dump indicates garbage collection
2 participants