This repository has been archived by the owner on Nov 1, 2020. It is now read-only.

Add normalized equivalent of YieldProcessor #7304

Closed

MichalStrehovsky wants to merge 4 commits into dotnet:master from MichalStrehovsky:normalizedSpinWait

Member

MichalStrehovsky commented Apr 15, 2019

Ports dotnet/coreclr#13670 to CoreRT.

Marked as draft because I don't have good perf numbers yet (and this is not an area that I'm comfortable making changes in).

MichalStrehovsky added 4 commits

April 13, 2019 18:57


          Port dotnet/coreclr#13670 to CoreRT

41405bd


          Lock spinning

ff66270


          Merge branch 'master' into normalizedSpinWait

5303a0a


          Merge branch 'master' into normalizedSpinWait

6d2a80f

MichalStrehovsky requested a review from kouvel

April 15, 2019 08:32

Member

jkotas commented Apr 15, 2019

Note that the GC is going to need this scaling factor too.

Member Author

MichalStrehovsky commented Apr 15, 2019

Note that the GC is going to need this scaling factor too.

Ah. I guess it would make more sense to write this in native code then.

kouvel reviewed

View reviewed changes

src/System.Private.CoreLib/src/System/Threading/Thread.CoreRT.cs

+                      [MethodImpl(MethodImplOptions.NoInlining)]
+                      private static int GetOptimalMaxSpinWaitsPerSpinIteration()
+                      {
+                          InitializeYieldProcessorNormalized();

Member

kouvel Apr 15, 2019

Probably should call EnsureYieldProcessorNormalizedInitialized() instead

src/System.Private.CoreLib/src/System/Threading/Thread.CoreRT.cs

+                          // Intel pre-Skylake processor: measured typically 14-17 cycles per yield
+                          // Intel post-Skylake processor: measured typically 125-150 cycles per yield
+                          const int DefaultYieldsPerNormalizedYield = 1; // defaults are for when no measurement is done
+                          const int DefaultOptimalMaxNormalizedYieldsPerSpinIteration = 64; // tuned for pre-Skylake processors, for post-Skylake it should be 7

Member

kouvel Apr 15, 2019

This should be 7 after spin-waits start using the normalization

src/System.Private.CoreLib/src/System/Threading/Thread.CoreRT.cs

+                          }
+                          EnsureYieldProcessorNormalizedInitialized();
+                          RuntimeImports.RhSpinWait(s_yieldsPerNormalizedYield * iterations);

Member

kouvel Apr 15, 2019

Check for or prevent overflow? CoreCLR uses a 64-bit spin count and prevents overflow on 32-bit

src/System.Private.CoreLib/src/System/Threading/Thread.CoreRT.cs

+                          }
+                          else if (yieldsPerNormalizedYield > MaxYieldsPerNormalizedYield)
+                          {
+                              yieldsPerNormalizedYield = MaxYieldsPerNormalizedYield;

Member

kouvel Apr 15, 2019

I don't think this is necessary, as described in the code from CoreCLR the value of yieldsPerNormalizedYield here is at most MinNsPerNormalizedYield and the value would be closer to what is measured without the extra limit. Probably doesn't matter much though.

src/System.Private.CoreLib/src/System/Threading/Thread.CoreRT.cs

+                      private static void InitializeYieldProcessorNormalized()
+                      {
+                          // TODO: critical section so that it only initializes once

Member

kouvel Apr 15, 2019

Would be good to have this

kouvel reviewed

View reviewed changes

src/System.Private.CoreLib/src/System/Threading/Thread.CoreRT.cs

+                              return;
+                          }
+                          EnsureYieldProcessorNormalizedInitialized();

Member

kouvel Apr 15, 2019

Could consider doing the initialization at the start of the finalizer thread. It's a bit unfortunate as it increases CPU time during startup (though not wall clock time). The GC would need a place for it as well. In CoreCLR, before initialization the relevant values are default-initialized so that spin-waits can still work. Once the initialization happens it could inform the GC of the scaling factor.

kouvel reviewed

View reviewed changes

src/System.Private.CoreLib/src/System/Threading/Thread.CoreRT.cs

+                          ulong elapsedTicks;
+                          do
+                          {
+                              RuntimeImports.RhSpinWait(10);

Member

kouvel Apr 15, 2019

10 is very short. Not sure what the implementation of HighPerformanceCounter.TickCount is on Unixes, but in CoreCLR it is significantly slower than on Windows and probably takes longer than 10 spin-waits on pre-Skylake processor, so the measurement may be off. I would recommend using the same value as in CoreCLR, 1000.

Member Author

MichalStrehovsky commented Jun 29, 2019

Replaced by #7569 that also does the right thing for the GC.

MichalStrehovsky closed this

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet