Skip to content
This repository has been archived by the owner on Nov 1, 2020. It is now read-only.

Add normalized equivalent of YieldProcessor #7304

Closed

Conversation

MichalStrehovsky
Copy link
Member

Ports dotnet/coreclr#13670 to CoreRT.

Marked as draft because I don't have good perf numbers yet (and this is not an area that I'm comfortable making changes in).

@jkotas
Copy link
Member

jkotas commented Apr 15, 2019

Note that the GC is going to need this scaling factor too.

@MichalStrehovsky
Copy link
Member Author

Note that the GC is going to need this scaling factor too.

Ah. I guess it would make more sense to write this in native code then.

[MethodImpl(MethodImplOptions.NoInlining)]
private static int GetOptimalMaxSpinWaitsPerSpinIteration()
{
InitializeYieldProcessorNormalized();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should call EnsureYieldProcessorNormalizedInitialized() instead

// Intel pre-Skylake processor: measured typically 14-17 cycles per yield
// Intel post-Skylake processor: measured typically 125-150 cycles per yield
const int DefaultYieldsPerNormalizedYield = 1; // defaults are for when no measurement is done
const int DefaultOptimalMaxNormalizedYieldsPerSpinIteration = 64; // tuned for pre-Skylake processors, for post-Skylake it should be 7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be 7 after spin-waits start using the normalization

}

EnsureYieldProcessorNormalizedInitialized();
RuntimeImports.RhSpinWait(s_yieldsPerNormalizedYield * iterations);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check for or prevent overflow? CoreCLR uses a 64-bit spin count and prevents overflow on 32-bit

}
else if (yieldsPerNormalizedYield > MaxYieldsPerNormalizedYield)
{
yieldsPerNormalizedYield = MaxYieldsPerNormalizedYield;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary, as described in the code from CoreCLR the value of yieldsPerNormalizedYield here is at most MinNsPerNormalizedYield and the value would be closer to what is measured without the extra limit. Probably doesn't matter much though.


private static void InitializeYieldProcessorNormalized()
{
// TODO: critical section so that it only initializes once
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to have this

return;
}

EnsureYieldProcessorNormalizedInitialized();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could consider doing the initialization at the start of the finalizer thread. It's a bit unfortunate as it increases CPU time during startup (though not wall clock time). The GC would need a place for it as well. In CoreCLR, before initialization the relevant values are default-initialized so that spin-waits can still work. Once the initialization happens it could inform the GC of the scaling factor.

ulong elapsedTicks;
do
{
RuntimeImports.RhSpinWait(10);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 is very short. Not sure what the implementation of HighPerformanceCounter.TickCount is on Unixes, but in CoreCLR it is significantly slower than on Windows and probably takes longer than 10 spin-waits on pre-Skylake processor, so the measurement may be off. I would recommend using the same value as in CoreCLR, 1000.

@MichalStrehovsky
Copy link
Member Author

Replaced by #7569 that also does the right thing for the GC.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants