-
Notifications
You must be signed in to change notification settings - Fork 511
Add normalized equivalent of YieldProcessor #7304
Add normalized equivalent of YieldProcessor #7304
Conversation
Note that the GC is going to need this scaling factor too. |
Ah. I guess it would make more sense to write this in native code then. |
[MethodImpl(MethodImplOptions.NoInlining)] | ||
private static int GetOptimalMaxSpinWaitsPerSpinIteration() | ||
{ | ||
InitializeYieldProcessorNormalized(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should call EnsureYieldProcessorNormalizedInitialized()
instead
// Intel pre-Skylake processor: measured typically 14-17 cycles per yield | ||
// Intel post-Skylake processor: measured typically 125-150 cycles per yield | ||
const int DefaultYieldsPerNormalizedYield = 1; // defaults are for when no measurement is done | ||
const int DefaultOptimalMaxNormalizedYieldsPerSpinIteration = 64; // tuned for pre-Skylake processors, for post-Skylake it should be 7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be 7 after spin-waits start using the normalization
} | ||
|
||
EnsureYieldProcessorNormalizedInitialized(); | ||
RuntimeImports.RhSpinWait(s_yieldsPerNormalizedYield * iterations); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check for or prevent overflow? CoreCLR uses a 64-bit spin count and prevents overflow on 32-bit
} | ||
else if (yieldsPerNormalizedYield > MaxYieldsPerNormalizedYield) | ||
{ | ||
yieldsPerNormalizedYield = MaxYieldsPerNormalizedYield; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is necessary, as described in the code from CoreCLR the value of yieldsPerNormalizedYield
here is at most MinNsPerNormalizedYield
and the value would be closer to what is measured without the extra limit. Probably doesn't matter much though.
|
||
private static void InitializeYieldProcessorNormalized() | ||
{ | ||
// TODO: critical section so that it only initializes once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to have this
return; | ||
} | ||
|
||
EnsureYieldProcessorNormalizedInitialized(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could consider doing the initialization at the start of the finalizer thread. It's a bit unfortunate as it increases CPU time during startup (though not wall clock time). The GC would need a place for it as well. In CoreCLR, before initialization the relevant values are default-initialized so that spin-waits can still work. Once the initialization happens it could inform the GC of the scaling factor.
ulong elapsedTicks; | ||
do | ||
{ | ||
RuntimeImports.RhSpinWait(10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 is very short. Not sure what the implementation of HighPerformanceCounter.TickCount
is on Unixes, but in CoreCLR it is significantly slower than on Windows and probably takes longer than 10 spin-waits on pre-Skylake processor, so the measurement may be off. I would recommend using the same value as in CoreCLR, 1000.
Replaced by #7569 that also does the right thing for the GC. |
Ports dotnet/coreclr#13670 to CoreRT.
Marked as draft because I don't have good perf numbers yet (and this is not an area that I'm comfortable making changes in).