-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stabilize performance measurement #43227
Comments
@dotnet/jit-contrib , @danmosemsft |
To state the obvious (I think) -- I believe we have good data to suggest that alignment is the dominant reason for bimodality. I'm not sure though that we can be sure there aren't other common causes for bimodality -- my assumption is that we'll find out how much is left when you've completed some of this work. |
Yes, there will definitely be more reason for bimodality, but alignment will fix most of the obvious ones that we know of and then, it will be easier for us to focus one remaining ones. Currently, there are just too many bimodal benchmarks. |
Bolded WIP texts in the plan. |
Moved future since .NET 6 scope items are complete now. |
I read your article here about loop alignment here https://devblogs.microsoft.com/dotnet/loop-alignment-in-net-6/ and it mentions that a loop like this
isn't a candidate for inlining because contains a call to a method |
Inlining decision happens in early phase of compilation and when we decide whether to align a loop or not, we already know that it was inlined or not. We won't align the loop if we know for sure that the method will not be inlined. |
The code quality and performance of RyuJIT is tracked internally by running MicroBenchmarks in our performance lab. We regularly triage the performance issues opened by the .NET performance team. After going through these issues for past several months, we have identified some key points.
Stability
Many times, the set of commits that are flagged as introducing regression in a benchmark, do not touch the code that is tested in the benchmark. In fact, the assembly code generated for the .NET code that is being tested is often identical and yet the measurements show differences. Some of our investigation reveals that the fluctuation in the benchmark measurements happen because of the misalignment of generated JIT code in process memory. Below is an example of LoopReturn benchmark that shows such behavior.
It is very time consuming for .NET developers to do the analysis of benchmarks that regressed because of things that are out of control of .NET runtime. In the past, we have closed several issues like #13770, #39721 and #39722 because they were regressions because of code alignment. A great example that we found out while investigating those issues was the change introduced in #38586 eliminated a
test
instruction and should have showed improvement in the benchmarks, but introduced regression because the code (loop code inside method) now gets misaligned and the method runs slower.Alignment issues was brought up few times in #9912 and #8108 and this issue tracks the progress towards the goal of stabilizing and possibly improving the performance of .NET apps that are heavily affected because of code alignment.
Performance lab infrastructure
Once we address the code alignment issue, the next big thing will be to identify and make required infrastructure changes in our performance lab to make sure that it can easily flag such issues without needing much interaction from .NET developers. For example, dotnet/BenchmarkDotNet#1513 proposes to make memory alignment in the benchmark run random to catch these issues early and once we address the underlying problem in .NET, we should never see bimodal behavior of those benchmarks. After that, if the performance lab does find a regression in the benchmark, we need to have robust tooling support to get possible metrics from performance runs so that a developer doing the investigation can easily identify the cause of regression. For example, identifying the time spent in various phases of .NET runtime like Jitting, Jit interface, Tier0/Tier1 JIT code, hot methods, instructions retired during benchmark execution and so forth.
Reliable benchmarks collection
Lastly, for developers working on JIT, we want to identify set of benchmarks that are stable enough and can be trusted to give us reliable measurement whenever there is a need to verify the performance for changes done to the JIT codebase. This will help us conduct performance testing ahead of time and identify potential regressions rather than waiting it to happen in performance lab.
Here are set of work items that we have identified to achieve all the above:
Code alignment work
Future work
jmp
orret
instruction that comes beforealign
instruction.NOP
instructions. Today, we just output repeated single byte90
, but could do better like we do for x64.Performance tooling work
The text was updated successfully, but these errors were encountered: