Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Reduce Timer lock contention #14527

Merged
merged 4 commits into from
Oct 17, 2017
Merged

Conversation

stephentoub
Copy link
Member

Timer currently uses a global lock to protect a single "queue" of Timers, and any operation to create/change/fire/delete a Timer takes that same lock. This leads to a scalability problem for code that operates on lots of timers.

This change partitions this single queue into N queues, which all operate independently. In the .NET Framework implementation, there's already logically such a split with one queue/lock per AppDomain, and this change utilizes the same underlying support in the runtime. As such, contention is distributed across the N locks, helping scalability.

Note that this change does not help the case where a Timer fires; it could even add a small amount of additional overhead to such a case, as there are more likely to be multiple VM callbacks to fire timers (one per queue), whereas previously there'd be just one for all timers. However, it potentially makes a significant impact to the throughput of creating and deleting timers, which we expect to be the common case for high-throughput timers (e.g. creating a timer to help timeout an operation that will likely complete before the timeout). Simple repro:

using System;
using System.Diagnostics;
using System.Threading.Tasks;
using System.Threading;

class Program
{
    static void Main()
    {
        var tasks = new Task[Environment.ProcessorCount];
        while (true)
        {
            for (int i = 0; i < tasks.Length; i++)
            {
                tasks[i] = Task.Run(async () =>
                {
                    for (int j = 0; j < 1_000_000; j++)
                    {
                        using (var t = new Timer(delegate { }, null, 100_000, -1))
                        {
                            //await Task.Yield();
                        }
                    }
                });
            }

            var sw = Stopwatch.StartNew();
            Task.WaitAll(tasks);
            Console.WriteLine(sw.Elapsed.TotalSeconds);
        }
    }
}

On my quad core, with the yield commented out (so that creation and disposal will definitely happen from the same thread), before this produces results like:

2.13162
2.1474115
2.0992453
2.0924658
2.1353377

whereas after it produces results like:

0.6014539
0.5870677
0.5938833
0.5823756
0.6019473

Uncommenting the await in order to put the disposal back into the thread pool, before:

3.8250889
3.8807055
3.7832106
3.9216054
3.744262

and after:

1.3021118
1.3428281
1.3408342
1.3113431
1.3529329

Note that all of this was done with the server GC. With the workstation GC, everything takes a lot longer, as allocation/GC becomes more expensive. With the await, before:

6.0565934
6.1419216
6.0375295
6.3585111
6.2168001

and after:

3.9577551
3.8124262
3.9511121
3.7542718
3.7818402

I also included two additional commits:

  • When a Timer does fire, we're currently queueing a work item that then fires the timer. We can instead just queue the timer object directly.
  • Task.Delay currently allocates four objects (Task, Timer, TimerHolder, TimerQueueTimer), two of which are wrappers and unnecessary with our internals access; this drops it from four to two (Task, TimerQueueTimer).

Fixes https://github.com/dotnet/coreclr/issues/14462
cc: @kouvel, @tarekgh, @vancem, @jkotas

ps I purposefully stuck with a change that I thought could be ported to the .NET Framework. However, given that AppDomains aren't an issue in .NET Core, I think a better longer-term solution would be rewriting Timer to be entirely managed (potentially with P/Invokes if needed) rather than having any code in the VM.

Timer currently uses a global lock to protect a single "queue" of Timers, and any operation to create/change/fire/delete a Timer takes that same lock.  This leads to a scalability problem for code that operates on lots of timers.

This change partitions this single queue into N queues, which all operate independently.  In the .NET Framework implementation, there's already logically such a split with one queue/lock per AppDomain, and this change utilizes the same underlying support in the runtime.  As such, contention is distributed across the N locks, helping scalability.
Eliminate a wrapper ThreadPoolWorkItem that's unnecessary.  Saves an allocation ever time a Timer fires.
Timer is just a wrapper for a TimerHolder which is just a wrapper for a TimerQueueTimer.  We don't actually want the behavior provided by TimerHolder (to allow the underlying timer to be collected while it's in use) and explicitly work around that, and for that pleasure we're paying two additional allocations.  Just skip directly to the inner type.
NULL,
NULL,
NULL);
ThreadpoolMgr::TimerInfoContext* timerContext = new (nothrow) ThreadpoolMgr::TimerInfoContext();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you delete (nothrow), you can also delete the check for OOM on the next line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

}

timerContext->AppDomainId = adid;
timerContext->TimerId = timerId;

BOOL res = ThreadpoolMgr::CreateTimerQueueTimer(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can actually throw even though it returns BOOL. It would be better to use NewHolder<ThreadpoolMgr::TimerInfoContext> for the timerContext.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@vancem
Copy link

vancem commented Oct 16, 2017

LGTM

{
// empty private constructor to ensure we remain a singleton.
var queues = new TimerQueue[Environment.ProcessorCount];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Environment.ProcessorCount [](start = 40, length = 26)

Is this guaranteed to exist in all OS's?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProcessorCount? Yes.

Copy link
Member

@tarekgh tarekgh Oct 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering if for someone for some reason deleted this environment variable what will happen? do we depend on that in other places? don't we have any other way to get this info?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This property has nothing to do with environment variables.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to know that :-)

@tarekgh
Copy link
Member

tarekgh commented Oct 16, 2017

LGTM

@stephentoub stephentoub merged commit fae8cc7 into dotnet:master Oct 17, 2017
@stephentoub stephentoub deleted the timer_partition branch October 17, 2017 02:22
@baal2000
Copy link

@stephentoub

Note that this change does not help the case where a Timer fires

I hoped it addresses contention in TimerQueueTimer.Fire() (see below) because lock (m_associatedTimerQueue) is now less contended.

image

Where is the issue then? The the ThreadPool part or elsewhere?
Need to know because the assumption of Timers never firing is incorrect when Task.Delay is concerned. There is nothing in Task.Delay documentation that advises of any preferred usage pattern.

Thanks!

@stephentoub
Copy link
Member Author

I hoped it addresses contention in TimerQueueTimer.Fire() (see below) because lock (m_associatedTimerQueue) is now less contended.

It may help with that contention as well. I simply meant that the primary benefit of the change is for the common case where timers are quickly created/destroyed.

@GSPP
Copy link

GSPP commented Apr 20, 2018

I believe a lot of people would not cancel/delete timeout timers when using Task.Delay. They would just write something like Task.Delay(...).ContinueWith(PerformCancellation) or something similar to Task.WhenAny(task, Task.Delay(...)) and drop the timeout task. Cancelling a timeout task when using Task.Delay is quite tedious since you need to create a CancellationTokenSource just for that.

I'm mentioning this since the PR says that timers that fire might even become a little bit slower. So there's a risk that certain kinds of code miss out on this optimization.

@stephentoub
Copy link
Member Author

I believe a lot of people would not cancel/delete timeout timers when using Task.Delay. They would just write something like Task.Delay(...).ContinueWith(PerformCancellation) or something similar to Task.WhenAny(task, Task.Delay(...)) and drop the timeout task.

If they do, they have other problems impacting their code's performance.

says that timers that fire might even become a little bit slower

It's minimal, but I mentioned it for completeness.

@vancem
Copy link

vancem commented Aug 17, 2018

Note Kurt Schenk presented a Service Fabric Cluster scenario where .NET Core

System.private.corelib!System.Threading.TimerQueue.FireNextTimer

was the hottest method. It was spending most of its time searching through the list of timers.
While is only one data-point it does suggest that we need to tune the 'many timers that actually fire' case as well (probably by making a priority queue of timers)

@kurtschenk
Copy link

kurtschenk commented Aug 28, 2018

Thanks Vance,

For more context we have a WebSocket gateway for IoT devices. The back end is hosted in Service Fabric and leverages the Actor programming model, and most relevant for this issue, Actor Events to send messages from the back end to the devices. ActorProxy.SubscribeAsync internally uses ActorProxy.ResubscribeAsyncV2 which uses Task.Delay for resiliency, to periodically re-subscribe to make sure get subscribed to the active primary replica if failover has happened between re-subscribe interval. See ActorProxy.cs. What this means is that as scale up the number of IoT devices (so far for us 10s to 100s of thousands of concurrent WebSocket connections per VM) many instances of TimerQueueTimer are created, and may timers are fired (each WebSocket Connection will have its own TimerQueueTimer). By moving to .NET Core 2.1, we have seen a significant improvement in performance (2-3x improvement in CPU); however, we still see a hotspot in System.private.corelib!System.Threading.TimerQueue.FireNextTimer at ~30% CPU.

image

@daviburg
Copy link

Debugging an app on .NET Framework 4.7.2 which CPU was jumping to 60-90% under load, I found System.Threading.Tasks.Task.Delay(int) on the top hot path with almost everything assigned to its Self CPU %. Checking out the source code on download from Reference source, it appears these fixes were not backported. The Task.Delay was used when the async code was waiting on an I/O to complete, attempting to sleep by increments of 15 ms. About 104 threads out of 186 threads in the application where contending for the value. 15 ms is the minimum delay mentioned in the public doc at https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task.delay?view=net-5.0

Should there be a remark in the public documentation recommending a higher minimum task delay to avoid running into such issue?

@stephentoub
Copy link
Member Author

it appears these fixes were not backported

They were, you just need to opt in. Search for Switch.System.Threading.UseNetCoreTimer.

@daviburg
Copy link

Switch.System.Threading.UseNetCoreTimer

Ok, albeit for .NET Framework 4.8 per https://docs.microsoft.com/en-us/dotnet/framework/configure-apps/file-schema/runtime/appcontextswitchoverrides-element#remarks

@stephentoub
Copy link
Member Author

albeit for .NET Framework 4.8

It's available as a patch for others as well, e.g. https://support.microsoft.com/en-us/topic/march-1-2019-kb4486553-cumulative-update-for-net-framework-3-5-and-4-7-2-for-windows-10-version-1809-and-windows-server-2019-094a0448-d554-9469-434b-b7b0f21a94e0

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants