forked from dotnet/coreclr
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improve thread pool worker thread's spinning for work
Closes https://github.com/dotnet/coreclr/issues/5928 Replaced UnfairSemaphore with a new implementation in CLRLifoSemaphore - UnfairSemaphore had a some benefits: - It tracked the number of spinners and avoids waking up waiters as long as the signal count can be satisfied by spinners - Since spinners get priority over waiters, that's the main "unfair" part of it that allows hot threads to remain hot and cold threads to remain cold. However, waiters are still released in FIFO order. - Spinning helps with throughput when incoming work is bursty - All of the above benefits were retained in CLRLifoSemaphore and some were improved: - Similarly to UnfairSemaphore, the number of spinners are tracked and preferenced to avoid waking up waiters - For waiting, on Windows, a I/O completion port is used since it releases waiters in LIFO order. For Unix, added a prioritized wait function to the PAL to register waiters in reverse order for LIFO release behavior. This allows cold waiters to time out more easily since they will be used less frequently. - Similarly to SemaphoreSlim, the number of waiters that were signaled to wake but have not yet woken is tracked to help avoid waking up an excessive number of waiters - Added some YieldProcessorNormalized() calls to the spin loop. This avoids thrashing on Sleep(0) by adding a delay to the spin loop to allow it to be more effective when there are no threads to switch to, or the only other threads to switch to are other similar spinners. - Removed the processor count multiplier on the max spin count and retuned the default max spin count. The processor count multiplier was causing excessive CPU usage on machines with many processors. Perf results For the test case in https://github.com/dotnet/coreclr/issues/5928, CPU time spent in UnfairSemaphore::Wait was halved. CPU time % spent in UnfairSemaphore::Wait relative to time spent in WorkerThreadStart reduced from about 88% to 78%. Updated spin perf code here: dotnet#13670 - NPc = (N * proc count) threads - MPcWi = (M * proc count) work items - BurstWorkThroughput queues that many work items in a burst, then releases the thread pool threads to process all of them, and once all are processed, repeats - SustainedWorkThroughput has work items queue another of itself with some initial number of work items such that the work item count never reaches zero ``` Spin Left score Right score ∆ Score % -------------------------------------------- -------------- -------------- --------- ThreadPoolBurstWorkThroughput 1Pc 000.25PcWi 276.10 ±1.09% 268.90 ±1.36% -2.61% ThreadPoolBurstWorkThroughput 1Pc 000.50PcWi 362.63 ±0.47% 388.82 ±0.33% 7.22% ThreadPoolBurstWorkThroughput 1Pc 001.00PcWi 498.33 ±0.32% 797.01 ±0.29% 59.94% ThreadPoolBurstWorkThroughput 1Pc 004.00PcWi 1222.52 ±0.42% 1348.78 ±0.47% 10.33% ThreadPoolBurstWorkThroughput 1Pc 016.00PcWi 1672.72 ±0.48% 1724.06 ±0.47% 3.07% ThreadPoolBurstWorkThroughput 1Pc 064.00PcWi 1853.94 ±0.25% 1868.36 ±0.45% 0.78% ThreadPoolBurstWorkThroughput 1Pc 256.00PcWi 1849.30 ±0.24% 1902.58 ±0.48% 2.88% ThreadPoolSustainedWorkThroughput 1Pc 1495.62 ±0.78% 1505.89 ±0.20% 0.69% -------------------------------------------- -------------- -------------- --------- Total 922.22 ±0.51% 1004.59 ±0.51% 8.93% ``` Numbers on Linux were similar with a slightly different spread and no regressions. I also tried the plaintext benchmark from https://github.com/aspnet/benchmarks on Windows (couldn't get it to build on Linux at the time). No noticeable change to throughput or latency, and the CPU time spent in UnfairSemaphore::Wait decreased a little from ~2% to ~0.5% in CLRLifoSemaphore::Wait.
- Loading branch information
Showing
13 changed files
with
687 additions
and
355 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.