-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NativeAOT] ordering issue around ensuring static constructor execution #81151
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas Issue DetailsRe: Line 59 in 983c8f2
We check The code orders the write to the flag after running constructors by doing full memory fence, but that is both too much and not sufficient. A simple fix would be to do volatile read of the flag when checking. If ultimately, we do not want to emit Replacing
|
Does standard CoreCLR have the same problem? I do not see any barriers there either. |
There is even less synchronization in CoreCLR. Setting the flag ( CoreCLR implementation is more complex too. Perhaps it is just timing between completion of cctor and setting the flag, that makes it unlikely that inconsistent state can be observed. This is how the full fence could be helping in NativeAOT. The fence could be introducing enough delay that we get lucky most of the time. A race for a cctor completion would be a relatively rare thing, since for a given class you get just one chance per program execution. |
I think this should be fixed for 9.0 given that we have an evidence that these race conditions are a real problem and we are fixing them in the JIT. |
Are we going to fix this for ARM32 as well? |
I think so. The JIT fix #105832 is arch neural, it should work for arm32 as well assuming that we emit correct code for volatile on arm32. |
I do not think there is anything to fix beyond the JIT fix (#105832). Line 19 in 1cc0186
@VSadov Do you agree that this issue can be closed? |
Delete unnecessary MemoryBarrier Fixes dotnet#81151
I think there is also a piece of hand-emitted (not by JIT) assembly that checks Line 36 in 1cc0186
It would need to emit ldar/dmb for arm64/32 or whatever RISC-V uses for acq/volatile loads. I was looking into this too. Ideally it would be nice to just move those stubs into C#, if possible. - just not to deal with assembly encoding. |
I’ve also explored possibility to use data dependency instead of explicit load fence by adding(as in "+") the ctor pointer to the base reference before returning it back. Just to create fake dependency. It is possible, but then I realized that it is not sufficient. The way memory model specifies this scenario requires that all writes of cctor are observable before accessing members, not just those accessible via instance/type. So it basically demands an actual load fence. runtime/docs/design/specs/Memory-model.md Lines 149 to 150 in 1cc0186
|
- Fix comments around static constructor memory model - Delete unnecessary MemoryBarrier Contributes to #81151
Re:
runtime/src/coreclr/nativeaot/System.Private.CoreLib/src/System/Runtime/CompilerServices/ClassConstructorRunner.cs
Line 59 in 983c8f2
We check
pContext->initialized
with an ordinary read, before accessing the static fields. There is a possibility that we will read the fields before reading the flag on weak-ordered architectures.The code orders the write to the flag after running constructors by doing full memory fence, but that is both too much and not sufficient.
Too much is because just doing
Volatile.Write(ref pContext->initialized, 1)
is enough to guarantee the order of writes.Not sufficient because the read side can still read ahead of the flag check.
A simple fix would be to do volatile read of the flag when checking.
However the same flag check is emitted by the JIT. Not sure if we emit
ldar
there. I suspect we do not. If that is a case, emittingldar
would be a simple solution.If ultimately, we do not want to emit
ldar
, an alternative could be using a slightly different pattern whereinitialized
instead of being a condition flag, could be a pointer used to access the statics. I.E. if it isnull
the cctor has not run yet, and if it is notnull
, use it as a pointer (or an offset even if it is fixed, as long as it is used to compute the address) to access static state, then we could benefit from data dependency ordering the reads.Replacing
Interlocked.MemoryBarrier()
with a process-wide barrier would work too, but seems too heavy.The text was updated successfully, but these errors were encountered: