Add an internal mode to `Lock` to have it use non-alertable waits #97227

kouvel · 2024-01-19T18:01:09Z

Added an internal constructor that enables the lock to use non-alertable waits
Non-alertable waits are not forwarded to SynchronizationContext wait overrides, are non-message-pumping waits, and are not interruptible
Updated most of the uses of Lock in NativeAOT to use non-alertable waits
Also simplified the fix in [NativeAOT] Fix Lock's usage of NativeRuntimeEventSource.Log to account for recursive accesses during its own class construction #94873 to avoid having to do the relevant null checks in various places along the wait path, by limiting the scope of the null checks to the initialization phase

- Added an internal constructor that enables the lock to use non-alertable waits - Non-alertable waits are not forwarded to `SynchronizationContext` wait overrides, are non-message-pumping waits, and are not interruptible - Updated most of the uses of `Lock` in NativeAOT to use non-alertable waits - Also simplified the fix in dotnet#94873 to avoid having to do the relevant null checks in various places along the wait path, by limiting the scope of the null checks to the initialization phase Fixes dotnet#97105

ghost · 2024-01-19T18:01:30Z

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

Issue Details

Added an internal constructor that enables the lock to use non-alertable waits
Non-alertable waits are not forwarded to SynchronizationContext wait overrides, are non-message-pumping waits, and are not interruptible
Updated most of the uses of Lock in NativeAOT to use non-alertable waits
Also simplified the fix in [NativeAOT] Fix Lock's usage of NativeRuntimeEventSource.Log to account for recursive accesses during its own class construction #94873 to avoid having to do the relevant null checks in various places along the wait path, by limiting the scope of the null checks to the initialization phase

Fixes #97105

Author:	kouvel
Assignees:	kouvel
Labels:	`area-NativeAOT-coreclr`
Milestone:	9.0.0

kouvel · 2024-01-19T18:02:25Z

I think the main uses of Lock in NativeAOT that have showed up in issues and need to be fixed to use non-alertable waits are the uses during class construction. After brief scans of the other uses, I've updated most of the uses of Lock in NativeAOT to use non-alertable waits (except the one in SyncTable used by Monitor, which should be alertable). I'm not 100% sure about all of them though, please take a look.

VSadov · 2024-01-19T18:19:57Z

What are the exact criteria for using alertable vs. nonalertable locks? Is that "may run user code while holding a lock" ?

kouvel · 2024-01-19T19:02:15Z

What are the exact criteria for using alertable vs. nonalertable locks? Is that "may run user code while holding a lock" ?

Alertable waits are interruptible (for Thread.Interrupt), allow reentrance through APCs and message pumping, and allow overriding the wait behavior through SynchronizationContext. If any of those features are desirable, and the scenario is ok with all of those features, the wait probably should be alertable. If any of those features are undesirable, the wait should probably be non-alertable.

jkotas

Thanks

VSadov · 2024-01-19T19:19:06Z

I understand the difference in the behavior. I was hoping if there could be some guidance on when one kind of lock should be used vs. another.

kouvel · 2024-01-19T19:23:30Z

I understand the difference in the behavior. I was hoping if there could be some guidance on when one kind of lock should be used vs. another.

I'm thinking of it like, if the lock usage scenario needs one of those alertable features and is designed to handle all of them, then the lock should use alertable waits, and otherwise, the lock should use non-alertable waits. Perhaps a reasonable and limited guidance in low-level runtime code would be to use non-alertable waits by default unless there is a good reason to use alertable waits.

jkotas · 2024-01-19T20:03:39Z

the lock usage scenario needs one of those alertable features and is designed to handle all of them

Properly constructed apps should never need the reentrant waits. Also, most code is not designed to handle the reentrant waits.

I think about the re-entrant waits as a legacy feature that works around COM STA related dead locks in buggy apps.

VSadov · 2024-01-20T00:10:54Z

I think about the re-entrant waits as a legacy feature that works around COM STA related dead locks in buggy apps.

Me too. I can't think of a scenario that would deliberately rely on re-entrant waits.
I think the defaults regarding reentrant waits are wrong for lock, but I guess it is way too late to fix that.

So the rule for deciding whether to use non-alertable lock is really "everything in the runtime uses nonalertable locks, except the Monitor lock, for legacy reasons"

I wonder if there is a use case for exposing the parameter via public API? Maybe it will lead to people using synchronization context less often? Arguably, when people use synchronization context to override Wait behavior, they think about their locks (not the unknown locks in libraries), so an option to make their locks uninterruptible could be useful.

VSadov · 2024-01-20T00:24:05Z

src/coreclr/nativeaot/System.Private.CoreLib/src/Internal/Runtime/FrozenObjectHeapManager.cs

@@ -16,7 +16,7 @@ internal unsafe partial class FrozenObjectHeapManager
    {
        public static readonly FrozenObjectHeapManager Instance = new FrozenObjectHeapManager();

-        private readonly LowLevelLock m_Crst = new LowLevelLock();
+        private readonly Lock m_Crst = new Lock(useAlertableWaits: false);


VSadov · 2024-01-20T00:38:48Z

src/libraries/System.Private.CoreLib/src/System/Threading/Lock.cs

+
+        private ushort WaiterStartTimeMs
+        {
+            get => (ushort)(_waiterStartTimeMsAndFlags >> 1);


ushort range in milliseconds is ~65 seconds. This reduces it by half. On the other hand we get these timings from ticks that are typically more coarse than 1 msec.

Overflows here are not dangerous per say and 30 seconds is still a long time, but perhaps it is still better to sacrifice precision rather than range?
I.E - store the value without shifting, just mask out the lowest bit when returning.

I don't think the range matters much, the feature that relies on it operates on 100-ms granularity, which can be served just as well before and after this change.

I.E - store the value without shifting, just mask out the lowest bit when returning.

I didn't follow, can you elaborate?

I was thinking of something like the following:

private ushort WaiterStartTimeMs { get => (ushort)(_waiterStartTimeMsAndFlags | 1); set => _waiterStartTimeMsAndFlags = (ushort)(value | (_waiterStartTimeMsAndFlags & 1)); }

In fact the | 1 in the getter might not be necessary, considering ticks are coarser than 1msec.

It is ok either way though.

What's the benefit of that?

What's the benefit of that?

Smaller chance of wrapping around if something gets stuck for 30 seconds. When it is a matter of seconds, things can realistically happen.

It is a very small benefit though since it would be rare and wrap around is not a huge deal.

I see, I don't think it's necessary to do something like that at the moment, there isn't an issue with the range.

VSadov

Thanks!

kouvel · 2024-01-20T09:43:17Z

I wonder if there is a use case for exposing the parameter via public API?

It would probably be useful to expose non-alertable waits in general in public APIs. There are plenty of cases in the libraries where alertable waits were/are used but are not appropriate, and there are probably plenty more elsewhere. So far the issues involved may not have been common enough to incite a strong-enough need to do so, but the default wait behavior being alertable seems to have more potential to be problematic.

jkotas · 2024-01-20T14:31:09Z

non-alertable waits in general in public APIs.

This PR bundles multiple features behind the nonAlertable

(1) Re-entrant COM waits
(2) EventSource instrumentation that leads to unexpected reentrancy
(3) Overriding via synchronization context
(4) Alertable options of Windows WaitForSingleObject/WaitForMulltipleObject APIs

(4) is not really problematic. The remaining 3 are the problematic behaviors.

Would be better to use a different name for this flag, and avoid associating with Windows alertable option? We have undocumented global switch https://github.com/search?q=repo%3Adotnet%2Fruntime%20APPDOMAIN_FORCE_TRIVIAL_WAIT_OPERATIONS that sounds like a reasonable name.

src/libraries/System.Private.CoreLib/src/System/Threading/Lock.cs

kouvel · 2024-01-24T23:08:28Z

Looks like the remaining CI failures are known issues.

…aits (dotnet#97227)" This reverts commit e568f75.

…aits (#97227)" (#98867) This reverts commit e568f75.

…waits (dotnet#97227)" (dotnet#98867) This reverts commit f129701.

PR dotnet#97227 introduced a tick count masking issue where the stored waiter start time excludes the upper bit from the ushort tick count, but comparisons with it were not doing the appropriate masking. This was leading to a lock convoy on some heavily contended locks once in a while due to waiters incorrectly appearing to have waited for a long time. Fixes dotnet#98021

kouvel added the area-NativeAOT-coreclr label Jan 19, 2024

kouvel added this to the 9.0.0 milestone Jan 19, 2024

kouvel requested review from jkotas, VSadov and MichalStrehovsky January 19, 2024 18:01

kouvel self-assigned this Jan 19, 2024

jkotas approved these changes Jan 19, 2024

View reviewed changes

This was referenced Jan 19, 2024

Checkout failure: "Git fetch failed with exit code 128" dotnet/arcade#9009

Open

BasicEventSourceTests.TestsManifestGeneration test failing in CI #97103

Closed

build-analysis bot mentioned this pull request Jan 19, 2024

[browser][MT] various WebWorkerTest CI failures #96628

Closed

VSadov reviewed Jan 20, 2024

View reviewed changes

VSadov approved these changes Jan 20, 2024

View reviewed changes

VSadov mentioned this pull request Jan 20, 2024

If an attempt to acquire a System.Threading.Lock ends up throwing, we should leave the lock unlocked. #97240

Open

jkotas reviewed Jan 20, 2024

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Threading/Lock.cs Outdated Show resolved Hide resolved

Rename useAlertableWaits to useTrivialWaits

6d2fa64

build-analysis bot mentioned this pull request Jan 23, 2024

Error while retrieving client Settings for PipelineArtifact. HttpRequestException: nodename nor servname provided, or not known (vsblobprodcus3.vsblob.visualstudio.com:443) #96798

Open

kouvel mentioned this pull request Jan 24, 2024

The "Build / browser-wasm linux Release LibraryTests_EAT" leg's build fails with "Nothing to show. Final logs are missing." #97117

Closed

kouvel merged commit e568f75 into dotnet:main Jan 24, 2024
176 of 179 checks passed

kouvel deleted the LockFix2 branch January 24, 2024 23:09

MichalStrehovsky mentioned this pull request Feb 6, 2024

20%+ throughput regression in goldilocks scenarios #98021

Closed

agocke added a commit to agocke/runtime that referenced this pull request Feb 23, 2024

Revert "Add an internal mode to Lock to have it use non-alertable w…

f8f03dd

…aits (dotnet#97227)" This reverts commit e568f75.

agocke mentioned this pull request Feb 23, 2024

Revert "Add an internal mode to Lock to have it use non-alertable w… #98867

Merged

jkotas pushed a commit that referenced this pull request Feb 23, 2024

Revert "Add an internal mode to Lock to have it use non-alertable w…

f129701

…aits (#97227)" (#98867) This reverts commit e568f75.

kouvel added a commit to kouvel/runtime that referenced this pull request Feb 23, 2024

Reapply "Add an internal mode to Lock to have it use non-alertable …

cb292c0

…waits (dotnet#97227)" (dotnet#98867) This reverts commit f129701.

kouvel mentioned this pull request Feb 23, 2024

Reapply revert of https://github.com/dotnet/runtime/pull/97227, fix Lock's waiter duration computation #98876

Merged

github-actions bot locked and limited conversation to collaborators Feb 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an internal mode to `Lock` to have it use non-alertable waits #97227

Add an internal mode to `Lock` to have it use non-alertable waits #97227

kouvel commented Jan 19, 2024

ghost commented Jan 19, 2024

kouvel commented Jan 19, 2024

VSadov commented Jan 19, 2024

kouvel commented Jan 19, 2024

jkotas left a comment

VSadov commented Jan 19, 2024

kouvel commented Jan 19, 2024

jkotas commented Jan 19, 2024

VSadov commented Jan 20, 2024 •

edited

Loading

VSadov Jan 20, 2024

VSadov Jan 20, 2024 •

edited

Loading

kouvel Jan 20, 2024

VSadov Jan 23, 2024

kouvel Jan 23, 2024

VSadov Jan 23, 2024

kouvel Jan 24, 2024

VSadov left a comment

kouvel commented Jan 20, 2024

jkotas commented Jan 20, 2024

kouvel commented Jan 24, 2024

Add an internal mode to Lock to have it use non-alertable waits #97227

Add an internal mode to Lock to have it use non-alertable waits #97227

Conversation

kouvel commented Jan 19, 2024

ghost commented Jan 19, 2024

kouvel commented Jan 19, 2024

VSadov commented Jan 19, 2024

kouvel commented Jan 19, 2024

jkotas left a comment

Choose a reason for hiding this comment

VSadov commented Jan 19, 2024

kouvel commented Jan 19, 2024

jkotas commented Jan 19, 2024

VSadov commented Jan 20, 2024 • edited Loading

VSadov Jan 20, 2024

Choose a reason for hiding this comment

VSadov Jan 20, 2024 • edited Loading

Choose a reason for hiding this comment

kouvel Jan 20, 2024

Choose a reason for hiding this comment

VSadov Jan 23, 2024

Choose a reason for hiding this comment

kouvel Jan 23, 2024

Choose a reason for hiding this comment

VSadov Jan 23, 2024

Choose a reason for hiding this comment

kouvel Jan 24, 2024

Choose a reason for hiding this comment

VSadov left a comment

Choose a reason for hiding this comment

kouvel commented Jan 20, 2024

jkotas commented Jan 20, 2024

kouvel commented Jan 24, 2024

Add an internal mode to `Lock` to have it use non-alertable waits #97227

Add an internal mode to `Lock` to have it use non-alertable waits #97227

VSadov commented Jan 20, 2024 •

edited

Loading

VSadov Jan 20, 2024 •

edited

Loading