Regressions in SeekUnroll #60626

performanceautofiler · 2021-10-19T07:33:29Z

Run Information

Architecture	x64
OS	Windows 10.0.18362
Baseline	33c0b570c2516747285db3bb63b77a74d4d4246e
Compare	7ecced4a339cb18e59d19c2ffa4e429072e309d9
Diff	Diff

Regressions in PerfLabTests.LowLevelPerf

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio	Baseline ETL	Compare ETL
ObjectStringIsString - Duration of single invocation	123.40 μs	155.92 μs	1.26	0.06	False
GenericGenericMethod - Duration of single invocation	155.92 μs	211.63 μs	1.36	0.06	False

Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'PerfLabTests.LowLevelPerf*'

Payloads

Baseline
Compare

Histogram

PerfLabTests.LowLevelPerf.ObjectStringIsString

PerfLabTests.LowLevelPerf.GenericGenericMethod

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

### Run Information

Architecture	x64
OS	Windows 10.0.18362
Baseline	33c0b570c2516747285db3bb63b77a74d4d4246e
Compare	7ecced4a339cb18e59d19c2ffa4e429072e309d9
Diff	Diff

Regressions in SeekUnroll

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio	Baseline ETL	Compare ETL
Test - Duration of single invocation	1.25 secs	1.56 secs	1.25	0.37	False

Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'SeekUnroll*'

Payloads

Baseline
Compare

Histogram

SeekUnroll.Test(boxedIndex: 3)

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Architecture	x64
OS	Windows 10.0.18362
Baseline	33c0b570c2516747285db3bb63b77a74d4d4246e
Compare	7ecced4a339cb18e59d19c2ffa4e429072e309d9
Diff	Diff

Regressions in System.Tests.Perf_Char

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio	Baseline ETL	Compare ETL
GetUnicodeCategory - Duration of single invocation	4.40 ns	5.74 ns	1.30	0.10	False

Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Tests.Perf_Char*'

Payloads

Baseline
Compare

Histogram

System.Tests.Perf_Char.GetUnicodeCategory(c: 'א')

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

The text was updated successfully, but these errors were encountered:

dotnet-issue-labeler · 2021-10-19T15:10:51Z

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

kunalspathak · 2021-10-19T15:11:00Z

Caused by #60228
@jakobbotsch

ghost · 2021-10-19T15:11:21Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Run Information

Architecture	x64
OS	Windows 10.0.18362
Baseline	33c0b570c2516747285db3bb63b77a74d4d4246e
Compare	7ecced4a339cb18e59d19c2ffa4e429072e309d9
Diff	Diff

Regressions in PerfLabTests.LowLevelPerf

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio	Baseline ETL	Compare ETL
ObjectStringIsString - Duration of single invocation	123.40 μs	155.92 μs	1.26	0.06	False
GenericGenericMethod - Duration of single invocation	155.92 μs	211.63 μs	1.36	0.06	False

Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'PerfLabTests.LowLevelPerf*'

Payloads

Baseline
Compare

Histogram

PerfLabTests.LowLevelPerf.ObjectStringIsString

PerfLabTests.LowLevelPerf.GenericGenericMethod

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

### Run Information

Architecture	x64
OS	Windows 10.0.18362
Baseline	33c0b570c2516747285db3bb63b77a74d4d4246e
Compare	7ecced4a339cb18e59d19c2ffa4e429072e309d9
Diff	Diff

Regressions in SeekUnroll

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio	Baseline ETL	Compare ETL
Test - Duration of single invocation	1.25 secs	1.56 secs	1.25	0.37	False

Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'SeekUnroll*'

Payloads

Baseline
Compare

Histogram

SeekUnroll.Test(boxedIndex: 3)

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Architecture	x64
OS	Windows 10.0.18362
Baseline	33c0b570c2516747285db3bb63b77a74d4d4246e
Compare	7ecced4a339cb18e59d19c2ffa4e429072e309d9
Diff	Diff

Regressions in System.Tests.Perf_Char

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio	Baseline ETL	Compare ETL
GetUnicodeCategory - Duration of single invocation	4.40 ns	5.74 ns	1.30	0.10	False

Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Tests.Perf_Char*'

Payloads

Baseline
Compare

Histogram

System.Tests.Perf_Char.GetUnicodeCategory(c: 'א')

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Author:	performanceautofiler[bot]
Assignees:	-
Labels:	`os-windows`, `tenet-performance`, `tenet-performance-benchmarks`, `arch-x64`, `area-CodeGen-coreclr`, `untriaged`
Milestone:	-

kunalspathak · 2021-10-19T15:24:40Z

More regressions: dotnet/perf-autofiling-issues#1926

jakobbotsch · 2021-10-20T11:39:16Z

Investigating now. Initial results are below.

main is 1bf02cc
no_lea is main with f40247e reverted
no_vsd_reg is main with 8076775 reverted

BenchmarkDotNet=v0.13.1.1611-nightly, OS=Windows 10.0.19043.1288 (21H1/May2021Update)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=7.0.100-alpha.1.21518.14
  [Host]     : .NET 7.0.0 (7.0.21.48001), X64 RyuJIT
  Job-VWGBDZ : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT
  Job-QJEQHT : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT
  Job-VXMGKV : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:DebugType=portable,-bl:benchmarkdotnet.binlog  IterationTime=250.0000 ms  
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1

Method	Job	Toolchain	boxedIndex	Mean	Error	StdDev	Median	Min	Max	Ratio	Code Size	Allocated
SeekUnroll	Job-VWGBDZ	\main\corerun.exe	1	864.5 ms	0.78 ms	0.65 ms	864.8 ms	863.1 ms	865.5 ms	1.00	392 B	-
SeekUnroll	Job-QJEQHT	\no_lea\corerun.exe	1	867.0 ms	4.41 ms	4.12 ms	867.1 ms	861.0 ms	875.0 ms	1.00	380 B	-
SeekUnroll	Job-VXMGKV	\no_vsd_reg\corerun.exe	1	860.4 ms	1.26 ms	1.17 ms	860.3 ms	858.5 ms	862.9 ms	1.00	392 B	-

SeekUnroll	Job-VWGBDZ	\main\corerun.exe	3	862.5 ms	3.38 ms	3.00 ms	861.3 ms	859.7 ms	869.7 ms	1.00	392 B	-
SeekUnroll	Job-QJEQHT	\no_lea\corerun.exe	3	861.2 ms	1.35 ms	1.13 ms	861.0 ms	859.0 ms	863.6 ms	1.00	380 B	10,224 B
SeekUnroll	Job-VXMGKV	\no_vsd_reg\corerun.exe	3	865.1 ms	4.56 ms	4.26 ms	864.1 ms	859.9 ms	871.7 ms	1.00	392 B	-

SeekUnroll	Job-VWGBDZ	\main\corerun.exe	11	1,077.1 ms	3.55 ms	3.32 ms	1,077.2 ms	1,071.8 ms	1,084.9 ms	1.00	392 B	-
SeekUnroll	Job-QJEQHT	\no_lea\corerun.exe	11	920.4 ms	2.32 ms	2.17 ms	920.0 ms	917.5 ms	924.2 ms	0.85	380 B	-
SeekUnroll	Job-VXMGKV	\no_vsd_reg\corerun.exe	11	1,076.5 ms	2.53 ms	2.24 ms	1,075.9 ms	1,073.9 ms	1,081.4 ms	1.00	392 B	-

SeekUnroll	Job-VWGBDZ	\main\corerun.exe	19	1,199.4 ms	2.44 ms	2.28 ms	1,199.3 ms	1,195.2 ms	1,203.3 ms	1.00	392 B	-
SeekUnroll	Job-QJEQHT	\no_lea\corerun.exe	19	1,132.8 ms	2.12 ms	1.98 ms	1,132.9 ms	1,130.2 ms	1,136.9 ms	0.94	380 B	-
SeekUnroll	Job-VXMGKV	\no_vsd_reg\corerun.exe	19	1,216.5 ms	2.34 ms	2.07 ms	1,216.8 ms	1,213.2 ms	1,220.4 ms	1.01	392 B	-

SeekUnroll	Job-VWGBDZ	\main\corerun.exe	27	1,412.3 ms	3.22 ms	3.01 ms	1,410.9 ms	1,407.8 ms	1,419.3 ms	1.00	392 B	-
SeekUnroll	Job-QJEQHT	\no_lea\corerun.exe	27	1,498.2 ms	1.30 ms	1.01 ms	1,498.3 ms	1,496.9 ms	1,500.1 ms	1.06	380 B	-
SeekUnroll	Job-VXMGKV	\no_vsd_reg\corerun.exe	27	1,406.6 ms	3.35 ms	3.13 ms	1,407.0 ms	1,401.4 ms	1,413.2 ms	1.00	392 B	-

It looks like we are no longer containing some immediates after f40247e. In SeekUnroll.InnerLoop: the diff from that change to main is:

@@ -66,10 +66,11 @@ M00_L02:
        mov       rsi,rcx
        mov       rdi,rdx
        xor       ebx,ebx
-       mov       rcx,7FFD0690BD78
+       mov       rcx,7FFD0690BC80
        mov       edx,2A
        call      CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
-       cmp       dword ptr [7FFD06A9C2B8],0
+       mov       rax,7FFD0690C1C0
+       cmp       dword ptr [rax],0
        jle       near ptr M01_L02
 M01_L00:
        vmovupd   ymm0,[rdi]
@@ -104,7 +105,8 @@ M01_L01:
        add       eax,edx
        mov       [rsi],eax
        inc       ebx
-       cmp       ebx,[7FFD06A9C2B8]
+       mov       rax,7FFD0690C1C0
+       cmp       ebx,[rax]
        jl        near ptr M01_L00
 M01_L02:
        vzeroupper
@@ -113,5 +115,5 @@ M01_L02:
        pop       rsi
        pop       rdi
        ret
-; Total bytes of code 196
+; Total bytes of code 208

jakobbotsch · 2021-10-20T12:08:52Z

The containment of that immediate seems to flip back and forth, which is strange. In some runs in main it is contained and in some runs it isn't.
It can clearly be contained in the following case since it is very close to the instruction:

[2021/10/20 14:04:26][INFO]  00007ffd`2c989b53        48B8C0C17F2CFD7F0000 mov      rax, 0x7FFD2C7FC1C0
[2021/10/20 14:04:26][INFO]  00007ffd`2c989b5d        3B18                 cmp      ebx, dword ptr [rax]

Due to dotnet#60712 we cannot unconditionally generate lea for handles, since the runtime does not expect us to generate ask for reloc hints for arbitrary constants. In addition, for many Intel CPUs it looks like rip-relative lea has greater latency than mov with an 8-byte immediate. Instead of reverting the change I'm just turning it off here, since the change also unified a couple of functions to simplify the handling. Fix dotnet#60627 Fix dotnet#60626

Due to #60712 we cannot unconditionally generate lea for handles, since the runtime does not expect us to generate ask for reloc hints for arbitrary constants. In addition, for many Intel CPUs it looks like rip-relative lea has greater latency than mov with an 8-byte immediate. Instead of reverting the change I'm just turning it off here, since the change also unified a couple of functions to simplify the handling. Fix #60626 Fix #60627 Fix #60629

kunalspathak changed the title ~~[Perf] Changes at 10/15/2021 8:43:07 AM~~ Regressions in SeekUnroll Oct 19, 2021

kunalspathak transferred this issue from dotnet/perf-autofiling-issues Oct 19, 2021

dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Oct 19, 2021

kunalspathak added arch-x64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-windows tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark labels Oct 19, 2021

jakobbotsch self-assigned this Oct 19, 2021

jakobbotsch added this to the 7.0.0 milestone Oct 19, 2021

jakobbotsch mentioned this issue Oct 21, 2021

Runtime<->JIT reloc hinting mechanism needs work #60712

Open

jakobbotsch mentioned this issue Oct 21, 2021

Turn off reloc generation for handles #60721

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Oct 21, 2021

jakobbotsch closed this as completed in #60721 Oct 22, 2021

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Oct 22, 2021

ghost locked as resolved and limited conversation to collaborators Nov 21, 2021

Regressions in SeekUnroll #60626

Regressions in SeekUnroll #60626

Comments

performanceautofiler bot commented Oct 19, 2021

Run Information

Regressions in PerfLabTests.LowLevelPerf

Repro

Payloads

Histogram

PerfLabTests.LowLevelPerf.ObjectStringIsString

PerfLabTests.LowLevelPerf.GenericGenericMethod

Docs

Regressions in SeekUnroll

Repro

Payloads

Histogram

SeekUnroll.Test(boxedIndex: 3)

Docs

Run Information

Regressions in System.Tests.Perf_Char

Repro

Payloads

Histogram

System.Tests.Perf_Char.GetUnicodeCategory(c: 'א')

Docs

dotnet-issue-labeler bot commented Oct 19, 2021

kunalspathak commented Oct 19, 2021

ghost commented Oct 19, 2021

Run Information

Regressions in PerfLabTests.LowLevelPerf

Repro

Payloads

Histogram

PerfLabTests.LowLevelPerf.ObjectStringIsString

PerfLabTests.LowLevelPerf.GenericGenericMethod

Docs

Regressions in SeekUnroll

Repro

Payloads

Histogram

SeekUnroll.Test(boxedIndex: 3)

Docs

Run Information

Regressions in System.Tests.Perf_Char

Repro

Payloads

Histogram

System.Tests.Perf_Char.GetUnicodeCategory(c: 'א')

Docs

kunalspathak commented Oct 19, 2021

jakobbotsch commented Oct 20, 2021

jakobbotsch commented Oct 20, 2021