Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[coop][interp] Fix GC transitions for thunk invoke wrappers #81773

Merged
merged 4 commits into from
Feb 13, 2023

Conversation

lambdageek
Copy link
Member

@lambdageek lambdageek commented Feb 7, 2023

The code that recognizes GC transition icalls in the interpreter was only assuming that it will see mono_threads_enter_gc_safe_region_unbalanced / mono_threads_exit_gc_safe_region_unbalanced which are used by managed-to-native wrappers.

In most cases for native-to-managed wrappers the marshaller emits mono_threads_attach_coop / mono_threads_detach_coop and those are handled elsewhere by setting the needs_thread_attach flag on the InterpMethod.

However when the mono_marshal_get_thunk_invoke_wrapper API is used, we emit a thunk invoke wrapper which uses
mono_threads_enter_gc_unsafe_region_unbalanced / mono_threads_exit_gc_unsafe_region_unbalanced

Change the thunk invoke wrapper to also use attach_coop/detach_coop. This makes the thunks a bit more flexible (they can now be called from unattached threads), at the cost of a TLS lookup. Also fixes interpreter support since they use the existing attach/detach handling.

As an implementation detail, I added a GC Unsafe Transition Builder to the marshaling code.

Fixes a failure seen in #81380 in the MonoAPI/MonoMono/Thunks test (exposed by calling a cctor later than previously)

The code that recognizes GC transition icalls in the interpreter was
only assuming that it will see
mono_threads_enter_gc_safe_region_unbalanced /
mono_threads_exit_gc_safe_region_unbalanced

which are used by managed-to-native wrappers.

In most cases for native-to-managed wrappers the marshaller emits
mono_threads_attach_coop / mono_threads_detach_coop and those are
handled elsewhere by setting the needs_thread_attach flag on the
InterpMethod.

However when the mono_marshal_get_thunk_invoke_wrapper API is used, we
emit a thunk invoke wrapper which uses
mono_threads_enter_gc_unsafe_region_unbalanced /
mono_threads_exit_gc_unsafe_region_unbalanced

Recognize those two icalls and set the needs_thread_attach
flag.  (This is slightly more work than necessary -
mono_threads_attach_coop checks if the thread was previously attached
to the runtime and attaches it if it wasn't.  The thunk invoke
wrappers seem to assume the API caller attaches the thread - so we pay
for an extra check.  But on the other hand we don't need to change the
execution-time behavior of the interpreter by reusing existing mechanisms)
@ghost
Copy link

ghost commented Feb 7, 2023

Tagging subscribers to this area: @BrzVlad, @kotlarmilos
See info in area-owners.md if you want to be subscribed.

Issue Details

The code that recognizes GC transition icalls in the interpreter was only assuming that it will see
mono_threads_enter_gc_safe_region_unbalanced /
mono_threads_exit_gc_safe_region_unbalanced

which are used by managed-to-native wrappers.

In most cases for native-to-managed wrappers the marshaller emits mono_threads_attach_coop / mono_threads_detach_coop and those are handled elsewhere by setting the needs_thread_attach flag on the InterpMethod.

However when the mono_marshal_get_thunk_invoke_wrapper API is used, we emit a thunk invoke wrapper which uses
mono_threads_enter_gc_unsafe_region_unbalanced /
mono_threads_exit_gc_unsafe_region_unbalanced

Recognize those two icalls and set the needs_thread_attach flag. (This is slightly more work than necessary - mono_threads_attach_coop checks if the thread was previously attached to the runtime and attaches it if it wasn't. The thunk invoke wrappers seem to assume the API caller attaches the thread - so we pay for an extra check. But on the other hand we don't need to change the execution-time behavior of the interpreter by reusing existing mechanisms)

Fixes a failures seen in #81380 in the MonoAPI/MonoMono/Thunks test (exposed by calling a cctor later than previously)

Author: lambdageek
Assignees: lambdageek
Labels:

area-Codegen-Interpreter-mono

Milestone: -

@lambdageek
Copy link
Member Author

/azp run runtime-extra-platforms

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@BrzVlad
Copy link
Member

BrzVlad commented Feb 8, 2023

I don't really understand how exactly your other PR induces this failure. Nevertheless, silently doing an attach behind the scenes seems like a hack to me. I think we can either choose to perform explicit attach during thunk invoke wrappers, since it sounds like it doesn't really have a perf cost. Otherwise it sounds like the test has a bug and it relies on some thread being attached. Or maybe the cctor invocation order is not really correct.

@lambdageek
Copy link
Member Author

silently doing an attach behind the scenes seems like a hack to me.

My point is, there is no attach happening. Any user code that calls the thunk returned by mono_marshal_get_thunk_invoke_wrapper must already have done an attach. So mono_thread_attach_coop is just to do the GC Safe -> GC Unsafe transition.

I think we can either choose to perform explicit attach during thunk invoke wrappers, since it sounds like it doesn't really have a perf cost.

Yea I guess I could just change emit_thunk_invoke_wrapper to emit attach/detach pairs like all the other thunks. It would be a change of behavior for existing code, but correct code shouldn't notice. and incorrect code will be upgraded to be correct.

@lambdageek
Copy link
Member Author

lambdageek commented Feb 8, 2023

I don't really understand how exactly your other PR induces this failure

It's the bit about not calling the managed ALC callback for the default ALC. Early during startup we used to end up in there and run the cctor for the default ALC. That would trigger a bunch of other early initialization. With my PR, we don't call that managed callback for the default ALC anymore because it is just the default implementation which always returns NULL. As a result a lot fewer cctors run at startup.

In the Thunks.cs test, the cctor for NotImplementedException is one of those cctors. After the PR, that cctor now runs when we call back into managed after the thunk_invoke_wrapper. And when cctors run they need to take a lock in mono - and attempt to do a GC Unsafe -> GC Safe transition. Under the interp, the thunk invoke wrapper was missing the GC Safe -> GC Unsafe transition on entry, so the cctor locking was doing GC Safe -> GC Safe, which is illegal. It worked on AOT and JIT because those had the correct transition in the wrapper.

Make a "GC Unsafe Transition Builder" - that always calls
mono_threads_attach_coop / mono_threads_detach_coop.

Use it in the native to managed wrappers:
emit_thunk_invoke_wrapper and emit_managed_wrapper

This is a change in behavior for emit_thunk_invoke_wrapper -
previously it directly called
mono_threads_enter_gc_unsafe_region_unbalanced.

That means that compared to the previous behavior, the thunk invoke
wrappers are now a bit more lax: they will be able to be called on
threads that aren't attached to the runtime and they will attach
automatically.

On the other hand existing code will continue to work, with the extra
cost of a check of a thread local var.

Using mono_thread_attach_coop also makes invoke wrappers work
correctly in the interpreter - it special cases
mono_thread_attach_coop but not enter_gc_unsafe.
@lambdageek
Copy link
Member Author

@BrzVlad I updated the PR to change the thunk invoke wrappers to use attach_coop/detach_coop and fall into the existing attach/detach handling in the interpreter

@lambdageek
Copy link
Member Author

lambdageek commented Feb 10, 2023

update oops, wrong PR

@lambdageek lambdageek merged commit 9f8c009 into dotnet:main Feb 13, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Mar 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants