Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Perf] Linux/arm64: 3 Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock #96851

Open
performanceautofiler bot opened this issue Jan 11, 2024 · 8 comments
Assignees
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-linux Linux OS (any supported distro) runtime-coreclr specific to the CoreCLR runtime tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark
Milestone

Comments

@performanceautofiler
Copy link

performanceautofiler bot commented Jan 11, 2024

Run Information

Name Value
Architecture arm64
OS ubuntu 20.04
Queue AmpereUbuntu
Baseline 4838097db4280704f24e1d30a9705a586e58628f
Compare adb4c18f5bcf6587230afe04cc6b61e2477f8137
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
74.28 ns 81.10 ns 1.09 0.19 False
151.99 μs 178.88 μs 1.18 0.01 True
157.63 μs 181.46 μs 1.15 0.01 True

graph
graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock*'

Payloads

Baseline
Compare

System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "(?s).*", Options: Compiled)

ETL Files

Histogram

JIT Disasms

System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "(?i)Sherlock Holmes", Options: Compiled)

ETL Files

Histogram

JIT Disasms

System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "(?i)Sherlock", Options: Compiled)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@performanceautofiler performanceautofiler bot added arch-arm64 os-linux Linux OS (any supported distro) runtime-coreclr specific to the CoreCLR runtime untriaged New issue has not been triaged by the area owner labels Jan 11, 2024
@EgorBo EgorBo changed the title [Perf] Linux/arm64: 3 Regressions on 1/5/2024 9:33:02 PM [Perf] Linux/arm64: 3 Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock Jan 11, 2024
@EgorBo EgorBo transferred this issue from dotnet/perf-autofiling-issues Jan 11, 2024
@ghost
Copy link

ghost commented Jan 11, 2024

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

Run Information

Name Value
Architecture arm64
OS ubuntu 20.04
Queue AmpereUbuntu
Baseline 4838097db4280704f24e1d30a9705a586e58628f
Compare adb4c18f5bcf6587230afe04cc6b61e2477f8137
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
74.28 ns 81.10 ns 1.09 0.19 False
151.99 μs 178.88 μs 1.18 0.01 True
157.63 μs 181.46 μs 1.15 0.01 True

graph
graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock*'

Payloads

Baseline
Compare

System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "(?s).*", Options: Compiled)

ETL Files

Histogram

JIT Disasms

System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "(?i)Sherlock Holmes", Options: Compiled)

ETL Files

Histogram

JIT Disasms

System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "(?i)Sherlock", Options: Compiled)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Author: performanceautofiler[bot]
Assignees: -
Labels:

arch-arm64, area-System.Text.RegularExpressions, os-linux, untriaged, runtime-coreclr

Milestone: -

@EgorBo
Copy link
Member

EgorBo commented Jan 11, 2024

@jakobbotsch could it be one of your PRs? commit range: 22068a8...a5ff7e6

@EgorBo EgorBo added tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed area-System.Text.RegularExpressions labels Jan 11, 2024
@ghost
Copy link

ghost commented Jan 11, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Run Information

Name Value
Architecture arm64
OS ubuntu 20.04
Queue AmpereUbuntu
Baseline 4838097db4280704f24e1d30a9705a586e58628f
Compare adb4c18f5bcf6587230afe04cc6b61e2477f8137
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
74.28 ns 81.10 ns 1.09 0.19 False
151.99 μs 178.88 μs 1.18 0.01 True
157.63 μs 181.46 μs 1.15 0.01 True

graph
graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock*'

Payloads

Baseline
Compare

System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "(?s).*", Options: Compiled)

ETL Files

Histogram

JIT Disasms

System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "(?i)Sherlock Holmes", Options: Compiled)

ETL Files

Histogram

JIT Disasms

System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "(?i)Sherlock", Options: Compiled)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Author: performanceautofiler[bot]
Assignees: -
Labels:

arch-arm64, os-linux, tenet-performance, tenet-performance-benchmarks, area-CodeGen-coreclr, untriaged, runtime-coreclr

Milestone: -

@jakobbotsch
Copy link
Member

@jakobbotsch could it be one of your PRs? commit range: 22068a8...a5ff7e6

#96553 would be the most suspect in the range, but it doesn't look like it had any diffs in benchmarks.

@JulieLeeMSFT JulieLeeMSFT added this to the 9.0.0 milestone Jan 13, 2024
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Jan 13, 2024
@jakobbotsch
Copy link
Member

The range for the two latter benchmarks looks more like d2993cc...4b19d67. Not sure how it's so different from the first one.

@stephentoub
Copy link
Member

It's much more likely to have been #96402, which isn't in the shown diff. The diff ranges in general seem to have been off a bunch lately.

@jakobbotsch
Copy link
Member

It's much more likely to have been #96402, which isn't in the shown diff. The diff ranges in general seem to have been off a bunch lately.

That does seem like it would make more sense, but on the multi-config graph it seems there was a small improvement for alpine.amd64.tiger.perf around the same time (January 10th):
image

I'll see if there's a diff from one of the JIT changes for that first benchmark.

@jakobbotsch
Copy link
Member

It looks like we do clone a new loop in that first benchmark, so likely that's caused by #96553. The loop is large and the cloning is questionable, but we do not have good heuristics for cloning today -- #8558 tracks this. I don't think we can do much for that in .NET 9, so I'm going to move this to .NET 10.

The cloning happens in Regex.RunAllMatchesWithCallback. The code size for that function goes from 1372 bytes to 2444 bytes. The functions identified to be hot by https://github.com/AndyAyersMS/InstructionsRetiredExplorer were:

52.58%   2.03E+07    ?        Unknown
30.41%   1.171E+07   Tier-1   [System.Text.RegularExpressions]Regex.RunAllMatchesWithCallback(class System.String,value class System.ReadOnlySpan`1<wchar>,int32,!!0&,class System.Text.RegularExpressions.MatchCallback`1<!!0>,value class System.Text.RegularExpressions.RegexRunnerMode,bool)
08.28%   3.19E+06    Tier-1   [System.Text.RegularExpressions]Match.AddMatch(int32,int32,int32)
02.78%   1.07E+06    Tier-1   [System.Text.RegularExpressions]Regex.Count(class System.String)
01.32%   5.1E+05     FullOpt  [40e9e7e9-47a6-428b-a6c6-191b0ce2022c]Runnable_0.WorkloadActionUnroll(int64)
01.22%   4.7E+05     native   clrjit.dll
00.99%   3.8E+05     Tier-1   [MicroBenchmarks]Perf_Regex_Industry_RustLang_Sherlock.Count()
00.96%   3.7E+05     Tier-1   [System.Text.RegularExpressions]RegexRunner.InitializeTimeout(value class System.TimeSpan)
00.57%   2.2E+05     native   ntoskrnl.exe
00.47%   1.8E+05     native   coreclr.dll
00.31%   1.2E+05     native   nvlddmkm.sys
00.08%   3E+04       native   ntdll.dll

(not sure why there's such a large fraction of "Unknown"...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-linux Linux OS (any supported distro) runtime-coreclr specific to the CoreCLR runtime tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark
Projects
None yet
Development

No branches or pull requests

4 participants