Substitute GT_RET_EXPR in inline candidate arguments #69117

jakobbotsch · 2022-05-10T10:18:02Z

These GT_RET_EXPR nodes would previously make it into the inlinees as part of the inline argument table. This was confusing to me, and it additionally means that we call gtRetExprVal() in a few places to compensate (while for non-toplevel cases it would essentially cause the argument node containing the GT_RET_EXPR to become "opaque" to the JIT). With this change we only ever expect to substitute GT_RET_EXPR inside fgUpdateInlineReturnExpressionPlaceHolder called from the main fgInline loop, so I have inlined gtRetExprVal() here and removed that function.

ghost · 2022-05-10T10:18:10Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

These GT_RET_EXPR nodes would previously make it into the inlinees as part of the inline argument table. This was confusing to me, and it additionally means that we call gtRetExprVal() in a few places to compensate (while for non-toplevel cases it would essentially cause the argument node containing the GT_RET_EXPR to become "opaque" to the JIT). With this change we only ever expect to substitute GT_RET_EXPR inside fgUpdateInlineReturnExpressionPlaceHolder called from the main fgInline loop, so I have inlined gtRetExprVal() here and removed that function.

Author:	jakobbotsch
Assignees:	jakobbotsch
Labels:	`area-CodeGen-coreclr`
Milestone:	-

jakobbotsch · 2022-05-10T10:20:00Z

src/coreclr/jit/fginline.cpp

+                    // Yes. We may still have GT_RET_EXPR in arguments,
+                    // substitute those here so the inlinee does not need to
+                    // deal with them.
+                    // Note that we leave late devirts in arguments for when we
+                    // handle inlined statements as fgLateDevirtualization
+                    // cannot handle being called twice on a late devirt
+                    // candidate.


This is not ideal, in a case where we could late-devirt an argument containing a call you could imagine doing so expose valuable information to the inlinee for whole-program optimization. We might consider looking into how to fix this if we decide to invest in interprocedural analysis.

Decided to fix this, didn't turn out to be too difficult.

jakobbotsch · 2022-05-10T12:49:11Z

/azp run Fuzzlyn, runtime-coreclr libraries-jitstress

azure-pipelines · 2022-05-10T12:49:31Z

Azure Pipelines successfully started running 2 pipeline(s).

jakobbotsch · 2022-05-10T13:50:24Z

/azp run Fuzzlyn, runtime-coreclr libraries-jitstress

azure-pipelines · 2022-05-10T13:50:38Z

Azure Pipelines successfully started running 2 pipeline(s).

jakobbotsch · 2022-05-10T19:45:26Z

Fuzzlyn failures are preexisting. libraries-jitstress failures are #69154 and #68803.

Diffs. Generally we see small improvements in the non-test code, and some regressions in the test code. Presumably these are because we now have fewer "opaque" nodes during inlining. I will spot check these when I have some more time, but I don't think we need to block the review on them.

There are some TP regressions, but they are mostly paid for by #69120 that I noticed while working on this. I think there are additional improvements to be made in the future; in particular, we currently walk the argument nodes once when we see the inline candidate, and then another time after they are added as part of inlining (because fgInlinePrependStatements adds them after the inline candidate call). So I think we can afford this given that it makes follow-up improvements simpler and generally simplifies where GT_RET_EXPR appears.

cc @dotnet/jit-contrib PTAL @AndyAyersMS

AndyAyersMS · 2022-05-10T19:50:56Z

I am curious about what's behind some of the larger diffs here, so please do take a look

         172 (15.45 % of base) : 21274.dasm - RequestUtilities:AddHeader(HttpRequestMessage,String,StringValues)

jakobbotsch · 2022-05-10T19:54:40Z

I am curious about what's behind some of the larger diffs here, so please do take a look

Hmm yes. And after looking at the diff summary again I can see that

Generally we see small improvements in the non-test code, and some regressions in the test code.

is no longer true. That's interesting because it was true before my latest commit 41e0af2, that changes late devirt to happen on the call arguments as well. These were the diffs before that.

AndyAyersMS

Changes look good.

I wonder if the diffs here are because this change uncovers more devirt + inlines?

AndyAyersMS · 2022-05-10T20:06:14Z

src/coreclr/jit/importer.cpp

@@ -21299,6 +21300,7 @@ void Compiler::impDevirtualizeCall(GenTreeCall*            call,
    // it's a union field used for other things by virtual
    // stubs)
    call->gtInlineCandidateInfo = nullptr;
+    call->gtCallMoreFlags &= ~GTF_CALL_M_LATE_DEVIRT;


Maybe we should rename this flag? GTF_CALL_M_HAS_LATE_DEVIRT_INFO perhaps? I introduced it but the name could better describe what it means.

jakobbotsch · 2022-05-10T20:43:33Z

I am curious about what's behind some of the larger diffs here, so please do take a look
         172 (15.45 % of base) : 21274.dasm - RequestUtilities:AddHeader(HttpRequestMessage,String,StringValues)

The difference here is that we do another inline because of:

-Inline candidate callsite is boring.  Multiplier increased to 2.3.
+Inline candidate callsite is warm.  Multiplier increased to 3.

This happens because we now do the following substitution:

+Replacing the return expression placeholder [000164] with [000812]
+               [000164] --C--------                         ▌  RET_EXPR  ref   (inl return expr [000812])
+
+Inserting the inline return expression
+               [000812] ---XG------                         ▌  FIELD     ref    _headers
+               [000811] -----------                         └──▌  LCL_VAR   ref    V13 tmp6         
+
 Expanding INLINE_CANDIDATE in statement STMT00050 in BB105:
 STMT00050 ( ??? ... ??? )
                [000167] I-C-G------                         ▌  CALL nullcheck int    HttpHeaders.TryAddWithoutValidation (exactContextHnd=0x00007FFC32571731)
-               [000164] --C-------- this                    ├──▌  RET_EXPR  ref   (inl return expr [000812])
+               [000812] ---XG------ this                    ├──▌  FIELD     ref    _headers
+               [000811] -----------                         │  └──▌  LCL_VAR   ref    V13 tmp6         
                [000165] ----------- arg1                    ├──▌  LCL_VAR   ref    V01 arg1         
                [000166] ----------- arg2                    └──▌  LCL_VAR   ref    V03 loc0

And that RET_EXPR node has BBF_PROF_WEIGHT which we propagate in fgUpdateInlineReturnExpressionPlaceHolder.
If I understand correctly, the same thing would have happened if the RET_EXPR was in the statement before the inline candidate, so this doesn't seem bad to me.
I'll spot check some more diffs.

jakobbotsch · 2022-05-10T20:47:45Z

The above may also explain parts of the TP diff, since if we do more inlines we expect to spend more time.

jakobbotsch · 2022-05-10T21:10:57Z

In asm.benchmarks.run.windows.x64.checked 8113 we have

-; 0 inlinees with PGO data; 59 single block inlinees; 39 inlinees without PGO data
+; 0 inlinees with PGO data; 69 single block inlinees; 48 inlinees without PGO data

The cause is that we create a couple fewer locals, which means in the new version we do one less lva table grow (which has a *= 2 multiplier). Then we end up with a bunch of extra inlines due to space in the lva table:

-Caller has 148 locals.  Multiplier decreased to 9.6668.
+Caller has 98 locals.  Multiplier decreased to 10.2186.
 calleeNativeSizeEstimate=1122
 callsiteNativeSizeEstimate=115
-benefit multiplier=9.6668
+benefit multiplier=10.2186
-threshold=1111
+threshold=1175
-Native estimate for function size exceeds threshold for inlining 112.2 > 111.1 (multiplier = 9.6668)
+Native estimate for function size is within threshold for inlining 112.2 <= 117.5 (multiplier = 10.2186)

Seems strange that we are using the capacity here:

runtime/src/coreclr/jit/inlinepolicy.cpp

Lines 1708 to 1714 in 96db556

    
           if (m_RootCompiler->lvaTableCnt > 64) 
        
           { 
        
               // E.g. MaxLocalsToTrack = 1024 and lvaTableCnt = 512 -> multiplier *= 0.5; 
        
               const double lclFullness = min(1.0, (double)m_RootCompiler->lvaTableCnt / JitConfig.JitMaxLocalsToTrack()); 
        
               multiplier *= (1.0 - lclFullness); 
        
               JITDUMP("\nCaller has %d locals.  Multiplier decreased to %g.", m_RootCompiler->lvaTableCnt, multiplier); 
        
           }

Should this be using lvaCount instead?

AndyAyersMS · 2022-05-10T22:57:54Z

Should this be using lvaCount instead?

I think this is something @EgorBo added... yeah, it seems odd to check capacity and not how much is actually in use.

jakobbotsch · 2022-05-11T09:24:53Z

In aspnet.benchmarks.run.windows.x64.checked 8840 we also have

-; 0 inlinees with PGO data; 243 single block inlinees; 63 inlinees without PGO data
+; 0 inlinees with PGO data; 243 single block inlinees; 67 inlinees without PGO data

Cause is the same as above, we end up with fewer locals and decide to inline more because of it.

In 10502 (System.Text.Json.JsonSerializer:ReadFromSpan) we end up with one fewer statement after an inline because we fold it early and realize it's a constant:

 INLINER: during 'fgInline' result 'success' reason 'aggressive inline attribute' for 'System.Text.Json.JsonSerializer:ReadFromSpan(System.ReadOnlySpan`1[Char],System.Text.Json.Serialization.Metadata.JsonTypeInfo):MicroBenchmarks.Serializers.SimpleStructWithProperties' calling 'System.Runtime.CompilerServices.Unsafe:SizeOf():int'
 INLINER: during 'fgInline' result 'success' reason 'aggressive inline attribute'
+
+Replacing the return expression placeholder [000347] with [000391]
+               [000347] --C--------                         ▌  RET_EXPR  int   (inl return expr [000391])
+
+Inserting the inline return expression
+               [000391] -----------                         ▌  CNS_INT   int    1
+
+
+Folding long operator with constant nodes into a constant:
+               [000348] --C--------                         ▌  CAST      long <- int
+               [000391] -----------                         └──▌  CNS_INT   int    1
+Bashed to long constant:
+               [000348] -----------                         ▌  CNS_INT   long   1
+
+Folding binary operator with a constant operand:
+               [000349] --C--------                         ▌  MUL       long  
+               [000346] -----------                         ├──▌  LCL_VAR   long   V27 tmp20        
+               [000348] -----------                         └──▌  CNS_INT   long   1
+Transformed into:
+               [000346] -----------                         ▌  LCL_VAR   long   V27 tmp20        
 Expanding INLINE_CANDIDATE in statement STMT00072 in BB50:
 STMT00072 ( INL17 @ ??? ... ??? ) <- INLRT @ 0x06A[E-]
                [000350] I-C-G------                         ▌  CALL      void   System.SpanHelpers.ClearWithoutReferences (exactContextHnd=0x00007FFF1E6F83B1)
                [000343] ----------- arg0                    ├──▌  LCL_VAR   byref  V26 tmp19        
-               [000349] --C-------- arg1                    └──▌  MUL       long  
-               [000346] -----------                            ├──▌  LCL_VAR   long   V27 tmp20        
-               [000348] --C--------                            └──▌  CAST      long <- int
-               [000347] --C--------                               └──▌  RET_EXPR  int   (inl return expr [000391])
+               [000346] ----------- arg1                    └──▌  LCL_VAR   long   V27 tmp20

This leads to one fewer statements added:

------------ Statements (and blocks) added due to the inlining of call [000350] -----------
-
-Arguments setup:
-STMT00084 ( INL17 @ ??? ... ??? ) <- INLRT @ 0x06A[E-]
-               [000410] -AC--------                         ▌  ASG       long  
-               [000409] D------N---                         ├──▌  LCL_VAR   long   V32 tmp25        
-               [000349] --C--------                         └──▌  MUL       long  
-               [000346] -----------                            ├──▌  LCL_VAR   long   V27 tmp20        
-               [000348] --C--------                            └──▌  CAST      long <- int
-               [000347] --C--------                               └──▌  RET_EXPR  int   (inl return expr [000391])

which finally means we clone a finally that brings the extra code:

-Finally in EH#0 has 16 statements, limit is 15; skipping.
+EH#0 is a candidate for finally cloning: 13 blocks, 15 statements

8810 is another case of more inlines:

-; 0 inlinees with PGO data; 33 single block inlinees; 30 inlinees without PGO data
+; 0 inlinees with PGO data; 42 single block inlinees; 34 inlinees without PGO data

In this case we manage to fold some arguments to constants, which results in us folding some branches during import and overall using less budget, so we run out of budget later than before.

I think from what I've seen so far that the diffs are good so will stop checking more cases.

jakobbotsch · 2022-05-11T19:21:10Z

Timeouts are known. Can you please take a look at the above diffs and approve if it sounds fine @AndyAyersMS?

jakobbotsch · 2022-05-12T21:58:29Z

Ping @AndyAyersMS

AndyAyersMS · 2022-05-13T00:37:02Z

Sorry, got caught up in other things. Looking now.

AndyAyersMS

LGTM. Thanks for looking at diffs.

jakobbotsch added 3 commits May 10, 2022 11:56

Substitute GT_RET_EXPR in inline candidate arguments before inlining

ea46012

Clean up usages of gtRetExprVal

857c50d

Small cleanup

28b116f

ghost assigned jakobbotsch May 10, 2022

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 10, 2022

jakobbotsch commented May 10, 2022

View reviewed changes

Fix a comment

f837f19

jakobbotsch mentioned this pull request May 10, 2022

Allow the inliner to substitute for small arguments #69068

Merged

Do late devirt on args too

41e0af2

This was referenced May 10, 2022

Host Test failures in rolling CI #69114

Closed

libraries-jitstress failure in System.Diagnostics.Tests.StackFrameTests.Ctor_FNeedFileInfo #69154

Closed

jakobbotsch requested a review from AndyAyersMS May 10, 2022 19:45

AndyAyersMS reviewed May 10, 2022

View reviewed changes

Rename GTF_CALL_M_LATE_DEVIRT -> GTF_CALL_M_HAS_LATE_DEVIRT_INFO

ea9e136

jakobbotsch mentioned this pull request May 12, 2022

Inliner uses capacity of locals table instead of count #69280

Open

AndyAyersMS approved these changes May 13, 2022

View reviewed changes

jakobbotsch merged commit 4ecd060 into dotnet:main May 13, 2022

jakobbotsch deleted the substitute-ret-exprs-in-inline-candidates branch May 13, 2022 07:38

JulieLeeMSFT mentioned this pull request Jun 3, 2022

What's new in .NET 7 Preview 5 [WIP] dotnet/core#7441

Closed

ghost locked as resolved and limited conversation to collaborators Jun 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Substitute GT_RET_EXPR in inline candidate arguments #69117

Substitute GT_RET_EXPR in inline candidate arguments #69117

jakobbotsch commented May 10, 2022

ghost commented May 10, 2022

jakobbotsch May 10, 2022 •

edited

Loading

jakobbotsch May 10, 2022

jakobbotsch commented May 10, 2022

azure-pipelines bot commented May 10, 2022

jakobbotsch commented May 10, 2022

azure-pipelines bot commented May 10, 2022

jakobbotsch commented May 10, 2022

AndyAyersMS commented May 10, 2022

jakobbotsch commented May 10, 2022

AndyAyersMS left a comment

AndyAyersMS May 10, 2022

jakobbotsch May 11, 2022

jakobbotsch commented May 10, 2022 •

edited

Loading

jakobbotsch commented May 10, 2022

jakobbotsch commented May 10, 2022 •

edited

Loading

AndyAyersMS commented May 10, 2022

jakobbotsch commented May 11, 2022

jakobbotsch commented May 11, 2022

jakobbotsch commented May 12, 2022

AndyAyersMS commented May 13, 2022

AndyAyersMS left a comment

Substitute GT_RET_EXPR in inline candidate arguments #69117

Substitute GT_RET_EXPR in inline candidate arguments #69117

Conversation

jakobbotsch commented May 10, 2022

ghost commented May 10, 2022

jakobbotsch May 10, 2022 • edited Loading

Choose a reason for hiding this comment

jakobbotsch May 10, 2022

Choose a reason for hiding this comment

jakobbotsch commented May 10, 2022

azure-pipelines bot commented May 10, 2022

jakobbotsch commented May 10, 2022

azure-pipelines bot commented May 10, 2022

jakobbotsch commented May 10, 2022

AndyAyersMS commented May 10, 2022

jakobbotsch commented May 10, 2022

AndyAyersMS left a comment

Choose a reason for hiding this comment

AndyAyersMS May 10, 2022

Choose a reason for hiding this comment

jakobbotsch May 11, 2022

Choose a reason for hiding this comment

jakobbotsch commented May 10, 2022 • edited Loading

jakobbotsch commented May 10, 2022

jakobbotsch commented May 10, 2022 • edited Loading

AndyAyersMS commented May 10, 2022

jakobbotsch commented May 11, 2022

jakobbotsch commented May 11, 2022

jakobbotsch commented May 12, 2022

AndyAyersMS commented May 13, 2022

AndyAyersMS left a comment

Choose a reason for hiding this comment

jakobbotsch May 10, 2022 •

edited

Loading

jakobbotsch commented May 10, 2022 •

edited

Loading

jakobbotsch commented May 10, 2022 •

edited

Loading