-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Conversation
Timer currently uses a global lock to protect a single "queue" of Timers, and any operation to create/change/fire/delete a Timer takes that same lock. This leads to a scalability problem for code that operates on lots of timers. This change partitions this single queue into N queues, which all operate independently. In the .NET Framework implementation, there's already logically such a split with one queue/lock per AppDomain, and this change utilizes the same underlying support in the runtime. As such, contention is distributed across the N locks, helping scalability.
Eliminate a wrapper ThreadPoolWorkItem that's unnecessary. Saves an allocation ever time a Timer fires.
Timer is just a wrapper for a TimerHolder which is just a wrapper for a TimerQueueTimer. We don't actually want the behavior provided by TimerHolder (to allow the underlying timer to be collected while it's in use) and explicitly work around that, and for that pleasure we're paying two additional allocations. Just skip directly to the inner type.
src/vm/comthreadpool.cpp
Outdated
NULL, | ||
NULL, | ||
NULL); | ||
ThreadpoolMgr::TimerInfoContext* timerContext = new (nothrow) ThreadpoolMgr::TimerInfoContext(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you delete (nothrow)
, you can also delete the check for OOM on the next line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
src/vm/comthreadpool.cpp
Outdated
} | ||
|
||
timerContext->AppDomainId = adid; | ||
timerContext->TimerId = timerId; | ||
|
||
BOOL res = ThreadpoolMgr::CreateTimerQueueTimer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can actually throw even though it returns BOOL
. It would be better to use NewHolder<ThreadpoolMgr::TimerInfoContext>
for the timerContext
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
LGTM |
{ | ||
// empty private constructor to ensure we remain a singleton. | ||
var queues = new TimerQueue[Environment.ProcessorCount]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Environment.ProcessorCount [](start = 40, length = 26)
Is this guaranteed to exist in all OS's?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ProcessorCount? Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering if for someone for some reason deleted this environment variable what will happen? do we depend on that in other places? don't we have any other way to get this info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This property has nothing to do with environment variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good to know that :-)
LGTM |
I hoped it addresses contention in TimerQueueTimer.Fire() (see below) because Where is the issue then? The the ThreadPool part or elsewhere? Thanks! |
It may help with that contention as well. I simply meant that the primary benefit of the change is for the common case where timers are quickly created/destroyed. |
I believe a lot of people would not cancel/delete timeout timers when using I'm mentioning this since the PR says that timers that fire might even become a little bit slower. So there's a risk that certain kinds of code miss out on this optimization. |
If they do, they have other problems impacting their code's performance.
It's minimal, but I mentioned it for completeness. |
Note Kurt Schenk presented a Service Fabric Cluster scenario where .NET Core System.private.corelib!System.Threading.TimerQueue.FireNextTimer was the hottest method. It was spending most of its time searching through the list of timers. |
Thanks Vance, For more context we have a WebSocket gateway for IoT devices. The back end is hosted in Service Fabric and leverages the Actor programming model, and most relevant for this issue, Actor Events to send messages from the back end to the devices. |
Debugging an app on .NET Framework 4.7.2 which CPU was jumping to 60-90% under load, I found System.Threading.Tasks.Task.Delay(int) on the top hot path with almost everything assigned to its Self CPU %. Checking out the source code on download from Reference source, it appears these fixes were not backported. The Task.Delay was used when the async code was waiting on an I/O to complete, attempting to sleep by increments of 15 ms. About 104 threads out of 186 threads in the application where contending for the value. 15 ms is the minimum delay mentioned in the public doc at https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task.delay?view=net-5.0 Should there be a remark in the public documentation recommending a higher minimum task delay to avoid running into such issue? |
They were, you just need to opt in. Search for Switch.System.Threading.UseNetCoreTimer. |
Ok, albeit for .NET Framework 4.8 per https://docs.microsoft.com/en-us/dotnet/framework/configure-apps/file-schema/runtime/appcontextswitchoverrides-element#remarks |
It's available as a patch for others as well, e.g. https://support.microsoft.com/en-us/topic/march-1-2019-kb4486553-cumulative-update-for-net-framework-3-5-and-4-7-2-for-windows-10-version-1809-and-windows-server-2019-094a0448-d554-9469-434b-b7b0f21a94e0 |
Timer currently uses a global lock to protect a single "queue" of Timers, and any operation to create/change/fire/delete a Timer takes that same lock. This leads to a scalability problem for code that operates on lots of timers.
This change partitions this single queue into N queues, which all operate independently. In the .NET Framework implementation, there's already logically such a split with one queue/lock per AppDomain, and this change utilizes the same underlying support in the runtime. As such, contention is distributed across the N locks, helping scalability.
Note that this change does not help the case where a Timer fires; it could even add a small amount of additional overhead to such a case, as there are more likely to be multiple VM callbacks to fire timers (one per queue), whereas previously there'd be just one for all timers. However, it potentially makes a significant impact to the throughput of creating and deleting timers, which we expect to be the common case for high-throughput timers (e.g. creating a timer to help timeout an operation that will likely complete before the timeout). Simple repro:
On my quad core, with the yield commented out (so that creation and disposal will definitely happen from the same thread), before this produces results like:
whereas after it produces results like:
Uncommenting the await in order to put the disposal back into the thread pool, before:
and after:
Note that all of this was done with the server GC. With the workstation GC, everything takes a lot longer, as allocation/GC becomes more expensive. With the await, before:
and after:
I also included two additional commits:
Fixes https://github.com/dotnet/coreclr/issues/14462
cc: @kouvel, @tarekgh, @vancem, @jkotas
ps I purposefully stuck with a change that I thought could be ported to the .NET Framework. However, given that AppDomains aren't an issue in .NET Core, I think a better longer-term solution would be rewriting Timer to be entirely managed (potentially with P/Invokes if needed) rather than having any code in the VM.