Stabilize performance measurement #43227

kunalspathak · 2020-10-09T18:15:24Z

The code quality and performance of RyuJIT is tracked internally by running MicroBenchmarks in our performance lab. We regularly triage the performance issues opened by the .NET performance team. After going through these issues for past several months, we have identified some key points.

Stability

Many times, the set of commits that are flagged as introducing regression in a benchmark, do not touch the code that is tested in the benchmark. In fact, the assembly code generated for the .NET code that is being tested is often identical and yet the measurements show differences. Some of our investigation reveals that the fluctuation in the benchmark measurements happen because of the misalignment of generated JIT code in process memory. Below is an example of LoopReturn benchmark that shows such behavior.

It is very time consuming for .NET developers to do the analysis of benchmarks that regressed because of things that are out of control of .NET runtime. In the past, we have closed several issues like #13770, #39721 and #39722 because they were regressions because of code alignment. A great example that we found out while investigating those issues was the change introduced in #38586 eliminated a test instruction and should have showed improvement in the benchmarks, but introduced regression because the code (loop code inside method) now gets misaligned and the method runs slower.

Alignment issues was brought up few times in #9912 and #8108 and this issue tracks the progress towards the goal of stabilizing and possibly improving the performance of .NET apps that are heavily affected because of code alignment.

Performance lab infrastructure

Once we address the code alignment issue, the next big thing will be to identify and make required infrastructure changes in our performance lab to make sure that it can easily flag such issues without needing much interaction from .NET developers. For example, dotnet/BenchmarkDotNet#1513 proposes to make memory alignment in the benchmark run random to catch these issues early and once we address the underlying problem in .NET, we should never see bimodal behavior of those benchmarks. After that, if the performance lab does find a regression in the benchmark, we need to have robust tooling support to get possible metrics from performance runs so that a developer doing the investigation can easily identify the cause of regression. For example, identifying the time spent in various phases of .NET runtime like Jitting, Jit interface, Tier0/Tier1 JIT code, hot methods, instructions retired during benchmark execution and so forth.

Reliable benchmarks collection

Lastly, for developers working on JIT, we want to identify set of benchmarks that are stable enough and can be trusted to give us reliable measurement whenever there is a need to verify the performance for changes done to the JIT codebase. This will help us conduct performance testing ahead of time and identify potential regressions rather than waiting it to happen in performance lab.

Here are set of work items that we have identified to achieve all the above:

Code alignment work

Make method entries of JIT code having loops aligned at 32-byte boundary for xarch. (Done in Fix condition to align methods to 32 bytes boundary #42909)
Devise a mechanism to identify inner loops (the ones that don't have more loops inside) present in a method. (Done: Align inner loops #44370)
Experiment with heuristics that should decide if a particular inner loop needs alignment or not. (Done: Align inner loops #44370)
Calculate the padding that needs to be added to align the identified inner loop. (Done: Align inner loops #44370)
Basic: Add the padding near the loop header (either at the end of previous block or at the beginning of the loop header). (Done: Align inner loops #44370)
Before merging above work, measure the impact of loop alignment on Microbenchmarks. Tracked in Measure loop alignment's performance impact on Microbenchmarks #44051. Also in Stabilize performance measurement #43227 (comment)
Identify benchmarks that get stabilized or whose performance is improved by the above work. (Update: See the analysis starting from Stabilize performance measurement #43227 (comment))

Future work

Make method entries of JIT code having loops aligned at relevant boundary for arm32/arm64. See section 4.8 of https://developer.arm.com/documentation/swog309707/a.
Combine branch tightening and loop alignment adjustments in single phase.
Advanced: If padding proves to be costly, have a way to spread the padding throughout the method so the loop header gets aligned. Account the padding while doing branch tensioning.
- Add padding at the blind spot like after jmp or ret instruction that comes before align instruction.
- Add padding at the end of blocks that has lower weight than the block that precedes the loop header block.
- Explore option to have a method misaligned such that we can skip the padding needed for loops and they get auto-aligned or aligned with minimal padding.
Perform loop alignment similar to Align inner loops #44370 for R2R code.
For x86, we should improve the encoding we use for multiple size NOP instructions. Today, we just output repeated single byte 90 , but could do better like we do for x64.
Explore option to align enclosing loops if it can borrow some of the padding needed for inner loop. E.g. in a nested loop, if there is an inner loop that needs padding of 10 bytes and the outer loop can be aligned by adding padding of 4 bytes, then add padding of 4 bytes to outer loop and 6 bytes to inner loop. That way, both the loops are aligned.

Performance tooling work

Add random memory alignment. Tracked by Make memory alignment more random BenchmarkDotNet#1513 and Refactor initialization logic to allow for enabling Memory Randomization performance#1587
Add various tooling in PerfView, BenchmarkDotNet analyzers to add the required metric. (WIP)
Perform superpmi collection of microbenchmarks so codegen team can easily test the impact of their change on generated code of various benchmarks. (Superpmi on Microbenchmarks #47900)
Add a new event to track memory allocated for JIT code that can be surfaced in PerfView through JITStats. PR: New event to track memory allocation for JIT code #44030 and Display JIT Allocated Heap Size microsoft/perfview#1289

The text was updated successfully, but these errors were encountered:

kunalspathak · 2020-10-09T18:15:40Z

@dotnet/jit-contrib , @danmosemsft

danmoseley · 2020-10-09T19:00:52Z

cc @DrewScoggins @billwert @adamsitnik

danmoseley · 2020-10-09T19:29:31Z

To state the obvious (I think) -- I believe we have good data to suggest that alignment is the dominant reason for bimodality. I'm not sure though that we can be sure there aren't other common causes for bimodality -- my assumption is that we'll find out how much is left when you've completed some of this work.

kunalspathak · 2020-10-09T19:32:30Z

To state the obvious (I think) -- I believe we have good data to suggest that alignment is the dominant reason for bimodality. I'm not sure though that we can be sure there aren't other common causes for bimodality -- my assumption is that we'll find out how much is left when you've completed some of this work.

Yes, there will definitely be more reason for bimodality, but alignment will fix most of the obvious ones that we know of and then, it will be easier for us to focus one remaining ones. Currently, there are just too many bimodal benchmarks.

JulieLeeMSFT · 2020-11-02T21:46:57Z

Bolded WIP texts in the plan.

JulieLeeMSFT · 2021-03-22T20:28:39Z

Moved future since .NET 6 scope items are complete now.

MithrilMan · 2021-08-13T00:15:47Z

I read your article here about loop alignment here https://devblogs.microsoft.com/dotnet/loop-alignment-in-net-6/ and it mentions that a loop like this

for (int l = 0; l < M; l++) {
        // body
        OtherMethod();
    }

isn't a candidate for inlining because contains a call to a method
What about methods that are inlined?
Shouldn't inline method be considered like loop body and be eligible for alignment if their body is small enough?

kunalspathak · 2021-08-13T00:19:02Z

What about methods that are inlined?

Inlining decision happens in early phase of compilation and when we decide whether to align a loop or not, we already know that it was inlined or not. We won't align the loop if we know for sure that the method will not be inlined.

Dotnet-GitSync-Bot added area-Meta untriaged New issue has not been triaged by the area owner labels Oct 9, 2020

adamsitnik added the tenet-performance-benchmarks Issue from performance benchmark label Oct 9, 2020

JulieLeeMSFT assigned kunalspathak Oct 9, 2020

JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Oct 9, 2020

JulieLeeMSFT added this to the 6.0.0 milestone Oct 9, 2020

JulieLeeMSFT added the Epic Groups multiple user stories. Can be grouped under a theme. label Oct 9, 2020

JulieLeeMSFT added Team Epic and removed Epic Groups multiple user stories. Can be grouped under a theme. labels Oct 19, 2020

JulieLeeMSFT closed this as completed Oct 19, 2020

JulieLeeMSFT reopened this Oct 19, 2020

adamsitnik mentioned this issue Oct 23, 2020

Enforce code alignment dotnet/BenchmarkDotNet#756

Closed

This was referenced Oct 26, 2020

[Arm64] Use stp and str (SIMD) for stack prolog zeroing #43789

Closed

New event to track memory allocation for JIT code #44030

Merged

Measure loop alignment's performance impact on Microbenchmarks #44051

Closed

kunalspathak mentioned this issue Nov 14, 2020

Align inner loops #44370

Merged

JulieLeeMSFT added User Story A single user-facing feature. Can be grouped under an epic. Bottom Up Work Not part of a theme, epic, or user story and removed Team Epic labels Nov 16, 2020

kunalspathak mentioned this issue Feb 12, 2021

Superpmi on Microbenchmarks #47900

Merged

JulieLeeMSFT mentioned this issue Feb 24, 2021

What's new in .NET 6 Preview 2 dotnet/core#5889

Closed

JulieLeeMSFT modified the milestones: 6.0.0, Future Mar 22, 2021

JulieLeeMSFT mentioned this issue Mar 26, 2021

What's new in .NET 6 Preview 3 dotnet/core#5890

Closed

kunalspathak mentioned this issue Apr 28, 2021

33% performance degradation in .NET 5 #44457

Closed

danmoseley mentioned this issue Jul 27, 2021

[Perf] Regressions in System.Collections.TryGetValueFalse<String, String> #51258

Closed

ghost added the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Aug 13, 2021

krwq removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Aug 13, 2021

ghost added the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Oct 4, 2021

buyaa-n removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Oct 7, 2021

kunalspathak mentioned this issue Oct 22, 2021

Hide 'align' instruction behind jmp #60787

Merged

joperezr added this to Triage POD for Reflection, META, etc. Nov 2, 2021

joperezr removed this from Triage POD for Reflection, META, etc. Nov 2, 2021

joperezr added this to Triage POD for Reflection, META, etc. Nov 2, 2021

joperezr removed this from Triage POD for Reflection, META, etc. Nov 2, 2021

joperezr moved this to Needs Consultation in Triage POD for Reflection, META, etc. Nov 2, 2021

joperezr added this to Triage POD for Reflection, META, etc. Nov 2, 2021

JulieLeeMSFT added this to .NET Core CodeGen Jun 5, 2024

JulieLeeMSFT moved this to Done in .NET Core CodeGen Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stabilize performance measurement #43227

Stabilize performance measurement #43227

kunalspathak commented Oct 9, 2020 •

edited

Loading

kunalspathak commented Oct 9, 2020

danmoseley commented Oct 9, 2020

danmoseley commented Oct 9, 2020

kunalspathak commented Oct 9, 2020

JulieLeeMSFT commented Nov 2, 2020

JulieLeeMSFT commented Mar 22, 2021

MithrilMan commented Aug 13, 2021

kunalspathak commented Aug 13, 2021

Stabilize performance measurement #43227

Stabilize performance measurement #43227

Comments

kunalspathak commented Oct 9, 2020 • edited Loading

Stability

Performance lab infrastructure

Reliable benchmarks collection

Code alignment work

Future work

Performance tooling work

kunalspathak commented Oct 9, 2020

danmoseley commented Oct 9, 2020

danmoseley commented Oct 9, 2020

kunalspathak commented Oct 9, 2020

JulieLeeMSFT commented Nov 2, 2020

JulieLeeMSFT commented Mar 22, 2021

MithrilMan commented Aug 13, 2021

kunalspathak commented Aug 13, 2021

kunalspathak commented Oct 9, 2020 •

edited

Loading