Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gradual regression of System.Buffers.Tests.ReadOnlySequenceTests<Char>.IterateGetPositionArray from net6.0 to net7.0-rc2 #77028

Closed
jozkee opened this issue Oct 13, 2022 · 6 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark
Milestone

Comments

@jozkee
Copy link
Member

jozkee commented Oct 13, 2022

This benchmark has been jumping up and down throughout the whole net7.0 cycle but looking at the 6.0 data point seems like it gradually regressed.

Link to the chart:
https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/main_x64_ubuntu%2018.04/System.Buffers.Tests.ReadOnlySequenceTests(Char).IterateGetPositionSingleSegment.html

Screenshot:
newplot

Benchmark results from 7.0-RC2 vs 6.0:

System.Buffers.Tests.ReadOnlySequenceTests.IterateGetPositionArray

Result Ratio Alloc Delta Operating System Bit Processor Name Modality
Slower 0.77 +0 ubuntu 18.04 Arm64 Unknown processor
Noise - +0 Windows 11 Arm64 Unknown processor
Slower 0.86 +0 Windows 11 Arm64 Microsoft SQ1 3.0 GHz
Slower 0.90 +0 Windows 11 Arm64 Microsoft SQ1 3.0 GHz
Same 0.92 +0 macOS Monterey 12.6 Arm64 Apple M1
Same 0.94 +0 macOS Monterey 12.6 Arm64 Apple M1 Max
Slower 0.67 +0 Windows 10 X64 Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R) bimodal
Noise - +0 Windows 11 X64 AMD Ryzen Threadripper PRO 3945WX 12-Cores
Same 0.98 +0 Windows 11 X64 AMD Ryzen 9 5900X
Same 0.95 +0 Windows 11 X64 AMD Ryzen 9 7950X
Slower 0.84 +0 Windows 11 X64 Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)
Slower 0.73 +0 debian 11 X64 Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)
Same 1.07 +0 ubuntu 18.04 X64 AMD Ryzen 9 5900X
Slower 0.84 +0 ubuntu 18.04 X64 Intel Xeon CPU E5-1650 v4 3.60GHz
Same 0.88 +0 ubuntu 20.04 X64 AMD Ryzen 9 5900X several?
Same 0.94 +0 ubuntu 20.04 X64 Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R)
Same 0.93 +0 ubuntu 20.04 X64 Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)
Slower 0.87 +0 macOS Big Sur 11.7 X64 Intel Core i5-4278U CPU 2.60GHz (Haswell)
Slower 0.85 +0 macOS Monterey 12.6 X64 Intel Core i7-4870HQ CPU 2.50GHz (Haswell)
@jozkee jozkee added tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Oct 13, 2022
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Oct 13, 2022
@ghost
Copy link

ghost commented Oct 13, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

This benchmark has been jumping up and down throughout the whole net7.0 cycle but looking at the 6.0 data point seems like it gradually regressed.

Link to the chart:
https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/main_x64_ubuntu%2018.04/System.Buffers.Tests.ReadOnlySequenceTests(Char).IterateGetPositionSingleSegment.html

Screenshot:
newplot

Benchmark results from 7.0-RC2 vs 6.0:

System.Buffers.Tests.ReadOnlySequenceTests.IterateGetPositionArray

Result Ratio Alloc Delta Operating System Bit Processor Name Modality
Slower 0.77 +0 ubuntu 18.04 Arm64 Unknown processor
Noise - +0 Windows 11 Arm64 Unknown processor
Slower 0.86 +0 Windows 11 Arm64 Microsoft SQ1 3.0 GHz
Slower 0.90 +0 Windows 11 Arm64 Microsoft SQ1 3.0 GHz
Same 0.92 +0 macOS Monterey 12.6 Arm64 Apple M1
Same 0.94 +0 macOS Monterey 12.6 Arm64 Apple M1 Max
Slower 0.67 +0 Windows 10 X64 Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R) bimodal
Noise - +0 Windows 11 X64 AMD Ryzen Threadripper PRO 3945WX 12-Cores
Same 0.98 +0 Windows 11 X64 AMD Ryzen 9 5900X
Same 0.95 +0 Windows 11 X64 AMD Ryzen 9 7950X
Slower 0.84 +0 Windows 11 X64 Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)
Slower 0.73 +0 debian 11 X64 Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)
Same 1.07 +0 ubuntu 18.04 X64 AMD Ryzen 9 5900X
Slower 0.84 +0 ubuntu 18.04 X64 Intel Xeon CPU E5-1650 v4 3.60GHz
Same 0.88 +0 ubuntu 20.04 X64 AMD Ryzen 9 5900X several?
Same 0.94 +0 ubuntu 20.04 X64 Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R)
Same 0.93 +0 ubuntu 20.04 X64 Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)
Slower 0.87 +0 macOS Big Sur 11.7 X64 Intel Core i5-4278U CPU 2.60GHz (Haswell)
Slower 0.85 +0 macOS Monterey 12.6 X64 Intel Core i7-4870HQ CPU 2.50GHz (Haswell)
Author: Jozkee
Assignees: -
Labels:

tenet-performance, tenet-performance-benchmarks, area-CodeGen-coreclr

Milestone: -

@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Oct 13, 2022
@JulieLeeMSFT JulieLeeMSFT added this to the 8.0.0 milestone Oct 13, 2022
@AndyAyersMS
Copy link
Member

I'll take a look.

@AndyAyersMS AndyAyersMS self-assigned this Nov 2, 2022
@AndyAyersMS
Copy link
Member

Locally I see only a small regression:

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22621.674)
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET SDK=7.0.100-rc.2.22477.23
[Host] : .NET 6.0.10 (6.0.1022.47605), X64 RyuJIT AVX2
Job-GJZRLG : .NET 6.0.10 (6.0.1022.47605), X64 RyuJIT AVX2
Job-FDXEMW : .NET 7.0.0 (7.0.22.47203), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms MaxIterationCount=20
MinIterationCount=15 WarmupCount=1

Method Job Runtime Toolchain Mean Error StdDev Median Min Max Ratio RatioSD Allocated Alloc Ratio
IterateGetPositionSingleSegment Job-GJZRLG .NET 6.0 net6.0 25.38 ns 0.890 ns 1.025 ns 25.23 ns 23.90 ns 27.47 ns 1.00 0.00 - NA
IterateGetPositionSingleSegment Job-FDXEMW .NET 7.0 net7.0 26.25 ns 0.295 ns 0.262 ns 26.32 ns 25.54 ns 26.56 ns 1.03 0.05 - NA

Profiling (7.0) shows:

image

@AndyAyersMS
Copy link
Member

x64 windows codegen for IterateGetPosition is very similar in 6 and 7. Some minor differences in allocation and in the tail part of the method, despite the fact that in 6 this method bypassed tiering.

Codegen for IterateGetPositionSingleSegment is even closer, the only difference is that in 7 we call IterateGetPosition indirectly.

However, just realized this is an x64 ubuntu issue, not a windows one. FWIW windows perf does not show this slow creep:

newplot - 2022-11-02T154244 360

So let me redo some of the above looking at ubuntu x64 codegen.

@AndyAyersMS
Copy link
Member

Perf has since improved and then regressed back a bit.

newplot - 2023-04-22T083900 233

Jan 2023 improvement looks like it was #81095
April 2023 regression possibly a pgo update #83624

@AndyAyersMS
Copy link
Member

Subsequently has improved again, and now faster than it's been; likely explanation was the pgo update in #85275

newplot - 2023-05-15T200437 827

@ghost ghost locked as resolved and limited conversation to collaborators Jun 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark
Projects
None yet
Development

No branches or pull requests

3 participants