-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Do not inline methods that never return #6103
Conversation
@mikedn |
Yes. I'll post an example later. Anyway, note the WIP title 😄. This very likely needs more work. I'm just trying to see if it works and show something to the JIT team to get some feedback. |
Here is an especially bad example of inlining such methods: private int FindEntry(TKey key) {
if( key == null) {
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
}
... test rdi, rdi
jne G_M14733_IG03
mov rcx, 0xD1FFAB1E
call CORINFO_HELP_NEWSFAST
mov rdi, rax
mov ecx, 5
call System.ThrowHelper:GetArgumentName(int):ref
mov rbx, rax
mov ecx, 0x1C4ED
mov rdx, 0xD1FFAB1E
call CORINFO_HELP_STRCNS
mov rcx, rax
call System.Environment:GetResourceStringLocal(ref):ref
mov rbp, rax
mov rcx, rdi
call System.Exception:Init():this
lea rcx, bword ptr [rdi+32]
mov rdx, rbp
call CORINFO_HELP_ASSIGN_REF
mov dword ptr [rdi+132], 0xD1FFAB1E
lea rcx, bword ptr [rdi+144]
mov rdx, rbx
call CORINFO_HELP_ASSIGN_REF
mov dword ptr [rdi+132], 0xD1FFAB1E
mov dword ptr [rdi+132], 0xD1FFAB1E
mov rcx, rdi
call CORINFO_HELP_THROW
G_M14733_IG03: |
Yeah, with this change that
It's still not quite right, I was expecting the conditional jump to be inverted and the call be moved to a cold block. Seems like PS: After workaround for |
Here's an example that works as expected: static void ThrowArgumentNullException() {
throw new ArgumentNullException();
}
static int Main(string[] args) {
if (args == null) {
ThrowArgumentNullException();
Console.WriteLine("bad");
}
return args.Length;
} Without this change the generated code is:
With this change the generated code is:
If
The throw method is not inlined but there are various shortcomings:
|
Diff summary:
Most of the improvements are in System.Private.CoreLib due to its high usage of throw helpers. It may be worth noting that this works only if the throw helper wasn't marked as no-inline. If it's no-inline the JIT won't figure out that it does not return. |
@mikedn |
Sort of. The parameterless helper is inlined and the other one is not. Neither should be inlined but it may be problematic to treat the parameterless helper as no-return as well. I'll see if something can be done about this once I figure out what's going on with |
The I'll update the diff summary, now there are no size regressions. |
cc @AndyAyersMS |
else if ((call->gtFlags & GTF_CALL_NO_RETURN) != 0) | ||
{ | ||
szFailReason = "No return"; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what motivated this change? Is there some downside to tail calling a no-return method (eg larger calling sequence?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tail call code is part of the epilog but with my change no epilog is generated anymore because the BB is converted to BBJ_THROW
. The net result is that a test failed because an exception was no longer being thrown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly the kind of thing that I'd like to see in a comment. A pointer to the test case that failed w/o this exclusion would be useful here too.
@mikedn Thanks for this, it looks very promising. Can you work up some micro-benchmarks for these cases? You can use the pattern from the inline microbenchmarks I checked in recently -- they are very sensitive to small changes in codegen. Also a general perf run would be nice. This is something that should be available soon from automation but in the meantime I can do some runs for you or show you how to run it -- however I won't get to it until later in the day. |
I'll try. I doubt that we'll see anything significant in terms of performance, this is first and foremost a size optimization. |
Well, they are right when they say that micro-benchmarks can prove anything. This little test runs in 500ms without my fix and 375ms with my fix. And if someone has the bad idea of applying I'll see if I can make a JIT benchmark out of it. Hmm, project.json... shudder... |
Please don't create a new project.json. Re-use one of the common ones we have set up for the JIT, like this one. And here's a template for what the test should look like.... |
I know, I know, it's just that there are certain aspects of project.json and the build system that are annoying. Like the warnings generated by those test project.json... I added a benchmark but I haven't been able to run it yet. |
👇
👀 |
Thanks for the benchmark. Since |
I tried a few things but the performance of FindEntry doesn't seem to be affected in any way by this change. I also played with a list indexer and that one is slightly faster with this change - 100,000,000 However, I don't think that the performance improvement shown by the previous benchmark and |
With your change it is 12.5% faster 👍 |
12.5% makes it sound like it is a lot. But it really isn't, at 3.10GHz that my CPU has the difference is less than half a cycle per |
Smaller code is generally preferable even if it's not faster. So I'm happy with the CS benefits you see from this change, and, barring surprises elsewhere, think it is a nice improvement. Here's my take on the work remaining to actually enable it:
I'd be happy to help out, especially with the TP/CQ assessments, since our tooling there is not yet what it should be. |
Thanks for the detailed work plan! For a start I'll tweak the matching code to see how it affects the jit diff results so we can get an idea about what needs to go in the new policy. |
a89781c
to
245a8e0
Compare
@AndyAyersMS, Yes the tool will fold all overloads together. This is some legacy behavior from an internal tool which was kept for convenience. The data that there were multiple instances is captured in the data structures in the tool so we could add some kind of notation to mark multiple instance entries. |
@AndyAyersMS with #6634 which has a couple more I'm seeing a 68,608 byte reduction of filesize |
@russellhadley I don't find folding to be a particularly useful default -- seems like instead we should report actual per-method data and provide aggregate views (by method, by class, by assembly, etc) to summarize. |
@russellhadley Hmm, I suppose we're stuck with some degree of folding since the jit dump has simplified types like ref, struct, etc. I'll update the reporting to at least show the number of folded instances so it's obvious there is more than one method being aggregated. |
Ideally we would like to be able to quickly find the top 10 methods that improved and the top 10 methods that regressed. It seems like aggregating multiple copies of a method together makes that task difficult. Is |
Here's the result with a bit of tweaking to Note I am still looking at CHK builds of corlib since we still need to sort out how to get the tooling to do the right jit swaps to measure the REL corlib (see dotnet/jitutils#14).
|
PR for tool update is dotnet/jitutils#22 |
Note that this also happens for generic instantiations. That's how |
This is a follow-up to dotnet#17501 that fixed #17398. #17398 was caused by a break in implicit contract between codegen and gc pointer reporting in fully-interruptible mode: the latter assumed that register gc pointer liveness doesn't change across calls while dotnet#6103 introduced codegen where it wasn't true. dotnet#17501 changed gc pointer reporting not to expect that register gc pointer liveness doesn't change across calls. This change inserts int3 after non-returning calls at the end of basic blocks so that gc pointer liveness doesn't change across calls. This is additional insurance in case any other place in runtime was dependent on that contract.
This is a follow-up to dotnet#17501 that fixed #17398. gc pointer reporting in fully-interruptible mode: the latter assumed that register gc pointer liveness doesn't change across calls while dotnet#6103 introduced codegen where it wasn't true. doesn't change across calls. This change inserts int3 after non-returning calls at the end of basic blocks so that gc pointer liveness doesn't change across calls. This is additional insurance in case any other place in the runtime is dependent on that contract.
…7535) This is a follow-up to #17501 that fixed #17398. gc pointer reporting in fully-interruptible mode: the latter assumed that register gc pointer liveness doesn't change across calls while #6103 introduced codegen where it wasn't true. doesn't change across calls. This change inserts int3 after non-returning calls at the end of basic blocks so that gc pointer liveness doesn't change across calls. This is additional insurance in case any other place in the runtime is dependent on that contract.
- Add a ThrowHelper utility class that provides non-returning methods for throwing exceptions, this in turn helps to reduce the code size, there-by increasing the chances of inlining simple (getter/setter) methods, which has cascading CQ improvments in its own. (See: dotnet/coreclr#6103) - Convert checks in the form of `(x < 0 || x > max)` to `(uint) x > max` to further reduce code-size and increase inlining chances - Update MSTest package deps - Drop netfx.props (which I introduces in a previous PR to support building on linux) and replace with the now standard/canonical way of referencing NetFX BCL on all platforms with a <PackageReference>
- Add a ThrowHelper utility class that provides non-returning methods for throwing exceptions, this in turn helps to reduce the code size, there-by increasing the chances of inlining simple (getter/setter) methods, which has cascading CQ improvments in its own. (See: dotnet/coreclr#6103) - Convert checks in the form of `(x < 0 || x > max)` to `(uint) x > max` to further reduce code-size and increase inlining chances - Update MSTest package deps - Drop netfx.props (which I introduces in a previous PR to support building on linux) and replace with the now standard/canonical way of referencing NetFX BCL on all platforms with a <PackageReference> - closes aeron-io#588
Do not inline methods that never return Commit migrated from dotnet/coreclr@c4da56e
Methods that do not contain return blocks can't ever exit normally, they either throw or loop indefinitely. Inlining is not beneficial in such cases as it increases the code size without providing any speed benefits. In the particular case of throws the inlined code can easily be 10 times larger than the call site.
The call to
fgMoreThanOneReturnBlock
has been replaced with code that does the same thing but also detects the existence of at least one return block. This avoids walking the basic block list twice.Note that
BBJ_RETURN
blocks are also generated forCEE_JMP
. Methods exiting viaCEE_JMP
instead ofCEE_RET
will continue to be inlined (assuming they were inlined before this change).