Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AggressiveInlining not respected when Method grow too large #41692

Closed
HurricanKai opened this issue Sep 1, 2020 · 14 comments · Fixed by #78874
Closed

AggressiveInlining not respected when Method grow too large #41692

HurricanKai opened this issue Sep 1, 2020 · 14 comments · Fixed by #78874
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Milestone

Comments

@HurricanKai
Copy link
Member

HurricanKai commented Sep 1, 2020

Description

JIT will refuse to inline AggressiveInlining methods when Methods grow too large (IL size / too many branches, I'm unsure which one, but I expect IL size)
If I understand AggressiveInlining correctly, this is not intended.

This happened to me in a method like the below, but I've encountered it before:

public IntPtr Load(int slot) {
    if (slot == SLOT_ONE)
    {
         return _slotOne;
    }
    // x 1000 or so

    ThrowHelper();
    return default;
}

This does not inline/constant fold, leading to quite the performance hit since now all those comparisons are actual comparisons.
Splitting this into a binary tree of sorts has led to JIT inlining it, and a good performance increase.

Analysis

I think this can be related to #38106
I don't understand JIT very well, but I think this can have multiple causes.

It might be an idea to look for comparisons with constants that "end" the method and prioritize inlining? Looking at the inlinepolicy, which I think is where this kind of thing is decided, this would be difficult, but possible?

category:cq
theme:inlining
skill-level:intermediate
cost:medium

@HurricanKai HurricanKai added the tenet-performance Performance related issue label Sep 1, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Sep 1, 2020
@AndyAyersMS
Copy link
Member

Aggressive inline candidates are assumed to be profitable; they bypass the normal profitability checks. So in particular we won't analyze their IL for patterns that might be optimizable.

Can you provide a repro for the issues you're seeing? We are trying to be as aggressive as we can without getting into situations where the jit will create enormous methods or use huge amounts of memory while jitting.

@HurricanKai
Copy link
Member Author

HurricanKai commented Sep 1, 2020

Sample.zip
the B struct contains a "simple" method with 4000 lookups, and while this is extreme, this is a real-world-like case. I'm writing a Generator that binds native methods, and am generating a table to cache the function pointers to those methods. I generate well over 4000 such cache slots.

@EgorBo
Copy link
Member

EgorBo commented Sep 1, 2020

Your repro doesn't have AggressiveInlining anywhere. While Lookup looks like it should be inlined to a single instruction when the arg is a constant I believe JIT doesn't want to spend time on analyzing big methods (because in 99% cases those won't be profitable to inline), e.g. in your case:

Inlines into 06000004 Test.Samples:AFive(Test.A):long:this
  [0 IL=0003 TR=000003 06000002] [FAILED: too many il bytes] Test.A:Lookup(int):long:this
Budget: initialTime=87, finalTime=87, initialBudget=870, currentBudget=870
Budget: initialSize=336, finalSize=33

if you put AggressiveInlining on top of Lookup it should work fine.

@HurricanKai
Copy link
Member Author

Sorry. Not sure what I thought with that sample.

I've just reverted some changes, and it appears what is actually happening (I didn't know this was a thing when I first encountered this issue) is I'm hitting the m_CallsiteDepth == 1 limitation.
I think it works something like this: I'm generating a Load(int slot, string entrypoint) method, which just calls another method, which is the actual lookup table. This simple call is of course inlined, which "consumes" the top-level inline possibility? The actual lookup is way too big to be considered for lookup, and can't be considered for inlining anyways cause m_CallsiteDepth exceeds 1 already.

This is unfortunate, but I'm not too sure if JIT can do much about this (besides raising the limit, but I'm sure there are some reasons for the limit being this low). I'll definitely have to keep this in mind, since I often rely heavily on JIT eliminating branches like this at runtime...

Apologies again for the incorrect sample 😅
Also, here is the actual generated code + disassembly.
(Relevant method is GeneratedVTable:Load being inlined to GeneratedVTable:Load_2575_67097892)

@JulieLeeMSFT
Copy link
Member

@AndyAyersMS Since you are looking into it, assigning this to you. Please help triage this.

@AndyAyersMS
Copy link
Member

; Total bytes of code 119642, prolog size 10, PerfScore 43637.45, 
   (MethodHash=d89b1d10) for method GeneratedVTable:Load_2575_67097892(int,String):long:this

@HurricanKai thanks for the example. The jit should be able to tell after inlining such a method that it wasn't nearly as big as it seemed; I'm trying to think of how it could figure this out ahead of time, or along the way. Ahead of time seems hard; as @EgorBo says, we don't want to spend a lot of time analyzing large methods; and our analysis isn't all that powerful.

But checking along the way seems viable -- I think something like the following can work. When the jit sees an AggressiveInlining candidate that is projected to go over budget, it can go ahead and starts inlining anyways, keeping track in real time of how much of the method has been imported and continually checking that versus the budget. If in the middle of inline the jit goes over budget, wil then abandon the inline.

That way if the jit is really just touching a fraction of a huge method during importation then the inline can succeed. The only downside is that the jit may spend a bunch of cycles attempting inlines that later are abandoned, but that shouldn't happen too often, and the use of AggressiveInlining gives the jit some license to spend a bit more time jitting.

@AndyAyersMS AndyAyersMS removed the untriaged New issue has not been triaged by the area owner label Sep 1, 2020
@JulieLeeMSFT JulieLeeMSFT added this to the 6.0.0 milestone Sep 2, 2020
@JulieLeeMSFT
Copy link
Member

Moving to .NET 6.

@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@BruceForstall BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Nov 10, 2020
@mikernet
Copy link
Contributor

mikernet commented Dec 2, 2020

@HurricanKai

Splitting this into a binary tree of sorts

Splitting methods with trial and error until they inline is somewhat annoying, but if I'm accurately picturing what you mean by a "binary tree of sorts" then breaking it up in a linear fashion like so may be easier and less tedious.

@HurricanKai
Copy link
Member Author

Thank you, that is what I did for the functions were I did this manually, and only have few branches, but in some cases I have 1000+ cases and I believe a binary tree like that will yield less IL, so less binary size.

@AndyAyersMS
Copy link
Member

@HurricanKai any way I can get a repro for this to test out some jit changes?

Is the one up above that is crossed out viable?

@HurricanKai
Copy link
Member Author

I believe the https://github.com/dotnet/runtime/files/5158907/OpenGL.zip is the only one that actually reproduced this. I can look at making a simpler repro, if needed.

@AndyAyersMS
Copy link
Member

It doesn't need to be simpler.

I'm not sure how to turn that into something I can compile.

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Apr 21, 2021
… opportunities

Look for IL patterns in an inlinee that are indicative of type folding. Also track
when an inlinee has calls that are intrinsics.

Use this to boost the inlinee profitability estimate.

Also, allow an aggressive inline inline candidate method to go over the budget when
it contains type folding or intrinsics (and so the method may be simpler than it appears).
Remove the old criteria (top-level inline) which was harder to reason about.

Closes dotnet#43761.
Closes dotnet#41692.
Closes dotnet#51587.
Closes dotnet#47434.
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Apr 21, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label May 10, 2021
@AndyAyersMS
Copy link
Member

@EgorBo perhaps you can follow up here?

@AndyAyersMS
Copy link
Member

Not sure we're going to get to this in 6.0, so moving to future.

@AndyAyersMS AndyAyersMS modified the milestones: 6.0.0, Future Jul 12, 2021
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Nov 26, 2022
@EgorBo EgorBo modified the milestones: Future, 8.0.0 Nov 26, 2022
@EgorBo EgorBo assigned EgorBo and unassigned AndyAyersMS Nov 26, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Dec 1, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Dec 31, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Projects
None yet
7 participants