Skip to content

Commit

Permalink
Remove Monitor asm helpers
Browse files Browse the repository at this point in the history
- Removed asm helpers on Windows and used portable C++ helpers instead
- Rearranged fast path code to improve them a bit and match the asm more closely

Perf:
- The asm helpers are a bit faster. The code generated for the portable helpers is almost the same now, the remaining differences are:
  - There were some layout issues where hot paths were in the wrong place and return paths were not cloned. Instrumenting some of the tests below with PGO on x64 resolved all of the layout issues. I couldn't get PGO instrumentation to work on x86 but I imagine it would be the same there.
  - Register usage
    - x64: All of the Enter functions are using one or two (TryEnter is using two) callee-saved registers for no apparent reason, forcing them to be saved and restored. r10 and r11 seem to be available but they're not being used.
    - x86: Similarly to x64, the compiled functions are pushing and popping 2-3 additional registers in the hottest fast paths.
    - I believe this is the main remaining gap and PGO is not helping with this
- On Linux, perf is >= before for the most part
- Perf tests used for below are updated in PR dotnet#13670

My guess is that these regressions are small and unlikely to materialize into real-world regressions. It would simplify and ease maintenance a bit to remove the asm, but since it looks like the register allocation issues would not be resolved easily, I'm not sure if we want to remove the asm code at this time. @jkotas and @vancem, thoughts?

Numbers (no PGO):

Windows x64

```
Spin                                              Left score       Right score      ∆ Score %
------------------------------------------------  ---------------  ---------------  ---------
MonitorEnterExitLatency 2T                          800.56 ±0.33%    821.97 ±0.30%      2.67%
MonitorEnterExitLatency 4T                         1533.25 ±0.34%   1553.82 ±0.13%      1.34%
MonitorEnterExitLatency 7T                         1676.14 ±0.26%   1678.14 ±0.18%      0.12%
MonitorEnterExitThroughput Delay 1T                5174.77 ±0.25%   5125.56 ±0.27%     -0.95%
MonitorEnterExitThroughput Delay 2T                4982.38 ±0.22%   4937.79 ±0.19%     -0.90%
MonitorEnterExitThroughput Delay 4T                4720.41 ±0.37%   4694.09 ±0.24%     -0.56%
MonitorEnterExitThroughput Delay 7T                3741.20 ±0.33%   3778.06 ±0.20%      0.99%
MonitorEnterExitThroughput_AwareLock 1T           63445.04 ±0.20%  61540.28 ±0.23%     -3.00%
MonitorEnterExitThroughput_ThinLock 1T            59720.83 ±0.20%  59754.62 ±0.12%      0.06%
MonitorReliableEnterExitLatency 2T                  809.31 ±0.23%    809.58 ±0.41%      0.03%
MonitorReliableEnterExitLatency 4T                 1569.47 ±0.45%   1577.43 ±0.71%      0.51%
MonitorReliableEnterExitLatency 7T                 1681.65 ±0.25%   1678.01 ±0.20%     -0.22%
MonitorReliableEnterExitThroughput Delay 1T        4956.40 ±0.41%   4957.46 ±0.24%      0.02%
MonitorReliableEnterExitThroughput Delay 2T        4794.52 ±0.18%   4756.23 ±0.25%     -0.80%
MonitorReliableEnterExitThroughput Delay 4T        4560.22 ±0.25%   4522.03 ±0.35%     -0.84%
MonitorReliableEnterExitThroughput Delay 7T        3902.19 ±0.55%   3875.81 ±0.13%     -0.68%
MonitorReliableEnterExitThroughput_AwareLock 1T   61944.11 ±0.20%  58083.95 ±0.08%     -6.23%
MonitorReliableEnterExitThroughput_ThinLock 1T    59632.31 ±0.25%  58972.48 ±0.07%     -1.11%
MonitorTryEnterExitThroughput_AwareLock 1T        62345.13 ±0.14%  57159.99 ±0.14%     -8.32%
MonitorTryEnterExitThroughput_ThinLock 1T         59725.76 ±0.15%  58050.35 ±0.16%     -2.81%
------------------------------------------------  ---------------  ---------------  ---------
Total                                              6795.49 ±0.28%   6723.21 ±0.23%     -1.06%
```

Windows x86

```
Spin                                              Left score       Right score      ∆ Score %
------------------------------------------------  ---------------  ---------------  ---------
MonitorEnterExitLatency 2T                          958.97 ±0.37%    987.28 ±0.32%      2.95%
MonitorEnterExitLatency 4T                         1675.18 ±0.41%   1704.64 ±0.08%      1.76%
MonitorEnterExitLatency 7T                         1825.49 ±0.09%   1769.50 ±0.12%     -3.07%
MonitorEnterExitThroughput Delay 1T                5083.01 ±0.27%   5047.10 ±0.37%     -0.71%
MonitorEnterExitThroughput Delay 2T                4854.54 ±0.13%   4825.31 ±0.14%     -0.60%
MonitorEnterExitThroughput Delay 4T                4628.89 ±0.17%   4579.92 ±0.56%     -1.06%
MonitorEnterExitThroughput Delay 7T                4125.52 ±0.48%   4096.78 ±0.20%     -0.70%
MonitorEnterExitThroughput_AwareLock 1T           61841.28 ±0.13%  57429.31 ±0.44%     -7.13%
MonitorEnterExitThroughput_ThinLock 1T            59746.69 ±0.19%  57971.43 ±0.10%     -2.97%
MonitorReliableEnterExitLatency 2T                  983.26 ±0.22%    998.25 ±0.33%      1.52%
MonitorReliableEnterExitLatency 4T                 1758.10 ±0.14%   1723.63 ±0.19%     -1.96%
MonitorReliableEnterExitLatency 7T                 1832.24 ±0.08%   1776.61 ±0.10%     -3.04%
MonitorReliableEnterExitThroughput Delay 1T        5023.19 ±0.05%   4980.49 ±0.08%     -0.85%
MonitorReliableEnterExitThroughput Delay 2T        4846.04 ±0.03%   4792.58 ±0.11%     -1.10%
MonitorReliableEnterExitThroughput Delay 4T        4608.14 ±0.09%   4574.90 ±0.06%     -0.72%
MonitorReliableEnterExitThroughput Delay 7T        4123.20 ±0.10%   4075.92 ±0.11%     -1.15%
MonitorReliableEnterExitThroughput_AwareLock 1T   57951.11 ±0.11%  57006.12 ±0.21%     -1.63%
MonitorReliableEnterExitThroughput_ThinLock 1T    58006.06 ±0.18%  58018.28 ±0.07%      0.02%
MonitorTryEnterExitThroughput_AwareLock 1T        60701.63 ±0.04%  53374.77 ±0.15%    -12.07%
MonitorTryEnterExitThroughput_ThinLock 1T         58169.82 ±0.05%  56023.58 ±0.69%     -3.69%
------------------------------------------------  ---------------  ---------------  ---------
Total                                              7037.46 ±0.17%   6906.42 ±0.22%     -1.86%
```

Linux x64

```
Spin repeater                                    Left score       Right score      ∆ Score %
-----------------------------------------------  ---------------  ---------------  ---------
MonitorEnterExitLatency 2T                        3755.92 ±1.51%   3802.80 ±0.62%      1.25%
MonitorEnterExitLatency 4T                        3448.14 ±1.69%   3493.84 ±1.58%      1.33%
MonitorEnterExitLatency 7T                        2593.97 ±0.13%   2655.21 ±0.15%      2.36%
MonitorEnterExitThroughput Delay 1T               4854.52 ±0.12%   4873.08 ±0.11%      0.38%
MonitorEnterExitThroughput Delay 2T               4659.19 ±0.85%   4695.61 ±0.38%      0.78%
MonitorEnterExitThroughput Delay 4T               4163.01 ±1.46%   4190.94 ±1.37%      0.67%
MonitorEnterExitThroughput Delay 7T               3012.69 ±0.45%   3123.75 ±0.32%      3.69%
MonitorEnterExitThroughput_AwareLock 1T          56665.09 ±0.16%  58524.86 ±0.24%      3.28%
MonitorEnterExitThroughput_ThinLock 1T           57476.36 ±0.68%  57573.08 ±0.61%      0.17%
MonitorReliableEnterExitLatency 2T                3952.35 ±0.45%   3937.80 ±0.49%     -0.37%
MonitorReliableEnterExitLatency 4T                3001.75 ±1.02%   3008.55 ±0.76%      0.23%
MonitorReliableEnterExitLatency 7T                2456.20 ±0.65%   2479.78 ±0.09%      0.96%
MonitorReliableEnterExitThroughput Delay 1T       4907.10 ±0.85%   4940.83 ±0.23%      0.69%
MonitorReliableEnterExitThroughput Delay 2T       4750.81 ±0.62%   4725.81 ±0.87%     -0.53%
MonitorReliableEnterExitThroughput Delay 4T       4329.93 ±1.18%   4360.67 ±1.04%      0.71%
MonitorReliableEnterExitThroughput Delay 7T       3180.52 ±0.27%   3255.88 ±0.51%      2.37%
MonitorReliableEnterExitThroughput_AwareLock 1T  54559.89 ±0.09%  55785.74 ±0.20%      2.25%
MonitorReliableEnterExitThroughput_ThinLock 1T   55936.06 ±0.36%  55519.74 ±0.80%     -0.74%
MonitorTryEnterExitThroughput_AwareLock 1T       52694.96 ±0.18%  54282.77 ±0.12%      3.01%
MonitorTryEnterExitThroughput_ThinLock 1T        54942.18 ±0.24%  55031.84 ±0.38%      0.16%
-----------------------------------------------  ---------------  ---------------  ---------
Total                                             8326.45 ±0.65%   8420.07 ±0.54%      1.12%
```
  • Loading branch information
kouvel committed Sep 23, 2017
1 parent ca01314 commit 9f3f91a
Show file tree
Hide file tree
Showing 14 changed files with 212 additions and 3,597 deletions.
7 changes: 0 additions & 7 deletions src/inc/jithelpers.h
Original file line number Diff line number Diff line change
Expand Up @@ -134,17 +134,10 @@
JITHELPER(CORINFO_HELP_ENDCATCH, JIT_EndCatch, CORINFO_HELP_SIG_CANNOT_USE_ALIGN_STUB)
#endif

#ifdef _TARGET_AMD64_
DYNAMICJITHELPER(CORINFO_HELP_MON_ENTER, JIT_MonEnterWorker, CORINFO_HELP_SIG_REG_ONLY)
DYNAMICJITHELPER(CORINFO_HELP_MON_EXIT, JIT_MonExitWorker, CORINFO_HELP_SIG_REG_ONLY)
DYNAMICJITHELPER(CORINFO_HELP_MON_ENTER_STATIC, JIT_MonEnterStatic,CORINFO_HELP_SIG_REG_ONLY)
DYNAMICJITHELPER(CORINFO_HELP_MON_EXIT_STATIC, JIT_MonExitStatic,CORINFO_HELP_SIG_REG_ONLY)
#else
JITHELPER(CORINFO_HELP_MON_ENTER, JIT_MonEnterWorker, CORINFO_HELP_SIG_REG_ONLY)
JITHELPER(CORINFO_HELP_MON_EXIT, JIT_MonExitWorker, CORINFO_HELP_SIG_REG_ONLY)
JITHELPER(CORINFO_HELP_MON_ENTER_STATIC, JIT_MonEnterStatic,CORINFO_HELP_SIG_REG_ONLY)
JITHELPER(CORINFO_HELP_MON_EXIT_STATIC, JIT_MonExitStatic,CORINFO_HELP_SIG_REG_ONLY)
#endif

JITHELPER(CORINFO_HELP_GETCLASSFROMMETHODPARAM, JIT_GetClassFromMethodParam, CORINFO_HELP_SIG_REG_ONLY)
JITHELPER(CORINFO_HELP_GETSYNCFROMCLASSHANDLE, JIT_GetSyncFromClassHandle, CORINFO_HELP_SIG_REG_ONLY)
Expand Down
38 changes: 5 additions & 33 deletions src/pal/inc/pal.h
Original file line number Diff line number Diff line change
Expand Up @@ -4195,6 +4195,9 @@ InterlockedDecrement(
return __sync_sub_and_fetch(lpAddend, (LONG)1);
}

#define InterlockedDecrementAcquire InterlockedDecrement
#define InterlockedDecrementRelease InterlockedDecrement

EXTERN_C
PALIMPORT
inline
Expand Down Expand Up @@ -4290,39 +4293,8 @@ InterlockedCompareExchange(
Exchange /* The value to be stored */);
}

EXTERN_C
PALIMPORT
inline
LONG
PALAPI
InterlockedCompareExchangeAcquire(
IN OUT LONG volatile *Destination,
IN LONG Exchange,
IN LONG Comperand)
{
// TODO: implement the version with only the acquire semantics
return __sync_val_compare_and_swap(
Destination, /* The pointer to a variable whose value is to be compared with. */
Comperand, /* The value to be compared */
Exchange /* The value to be stored */);
}

EXTERN_C
PALIMPORT
inline
LONG
PALAPI
InterlockedCompareExchangeRelease(
IN OUT LONG volatile *Destination,
IN LONG Exchange,
IN LONG Comperand)
{
// TODO: implement the version with only the release semantics
return __sync_val_compare_and_swap(
Destination, /* The pointer to a variable whose value is to be compared with. */
Comperand, /* The value to be compared */
Exchange /* The value to be stored */);
}
#define InterlockedCompareExchangeAcquire InterlockedCompareExchange
#define InterlockedCompareExchangeRelease InterlockedCompareExchange

// See the 32-bit variant in interlock2.s
EXTERN_C
Expand Down
Loading

0 comments on commit 9f3f91a

Please sign in to comment.