This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Reduce CPU consumption by Timer's FireNextTimers
Historically, in certain applications that were Timer-heavy (either via direct use or via wrappers like Task.Delay), timer-related operations like creation (ctor), destruction (Dispose), and firing (internally in FireNextTimers) were relatively expensive. A single linked queue of timers was maintained and protected by a single lock, such that every creation, destruction, and firing would require taking that lock and operating on the list. In .NET Core 2.1, we improved this significantly to reduce contention by splitting the single lock and queue into N partitions, each with its own lock and queue (and native timer that triggers the processing of that queue). This enables lots of threads that would otherwise all be contending for timer creation/destruction/firing to now spread the load across the various partitions. This made a significantly positive and measurable impact on these timer-heavy workloads, in particular for workloads that created and destroyed lots of timers with most never actually firing (e.g. where timers are used to implement timeouts, and most things don't timeout). However, we still see some apps that rely heavily on timers firing, in particular with apps that have tens of thousands or hundreds of thousands of periodic timers, with lots of time spent inside FireNextTimers (the function that's invoked when the native timer for a partition fires to process whatever set of timers may be ready to fire in its queue). This operation currently walks the whole list of that queue's timers, such that it needs to touch and do a little work for every scheduled timer in that partition, even if only one or a few timers actually need to fire. The more timers are scheduled, even if they're for a time far in the future, the more costly FireNextTimers becomes. And as FireNextTimers is invoked while holding that partition's lock, this also then slows down any code paths trying to create/destroy timers on the same partition. This PR attempts to address the most impactful cases of that. Instead of each partition maintaining a single queue, we simply split the queue into two: a "short" list that contains all scheduled timers with a next firing time that's <= some absolute threshold, and a "long" list for the rest. When FireNextTimers is invoked, we walk the "short" list, processing it as we do today. If the current time is less than or equal to the absolute threshold, then we know we don't need to examine the long list and can skip it and the (hopefully) majority of timers it contains. If, however, we're beyond that time or the short list becomes empty, we continue to process the long list as we do today, with the slight modification that we then also move to the short list anything with a due time that puts it at or under the threshold, which is reset to point to a time short into the future. When new timers are added, we just add them to the appropriate list based on their due time. The theory behind this is that the problematic cases are when we have lots of long-lived timers that rarely fire but today we're having to examine them every time any timer fires; by maintaining a short list that ideally has only a small subset of the timers, we can avoid touching the majority of the timers each time FireNextTimers is called, on average. This doesn't change the O(N) complexity of FireNextTimers, but it should often reduce the size of N significantly. Synthetic workloads have shown that it often reduces the cost of FireNextTimers to just 5-10% of what it was previously, and without increasing the cost of the fast add/dispose path measurably. (An alternative approach considered is to use a priority queue / heap for the timer list. This would allow for FireNextTimers to pull off in O(log N) time each of the next timers to fire, hopefully making FireNextTimers much cheaper. However, it comes at the expense of making creation and destruction also O(log N) instead of O(1). And in cases where all timers in the list needed to fire, it would make FireNextTimers O(N log N) instead of O(N). It is, however, an approach that could be pursued if this approach proves less effective in real-world applications than expected.)
- Loading branch information