Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

JIT: run type equality optimizations earlier #14244

Merged
merged 5 commits into from
Oct 3, 2017

Conversation

AndyAyersMS
Copy link
Member

Refactor the type equality optimizations embedded in morph so that they can
also be invoked earlier, in particular, during importation. This allows the
importer to short-circut importing blocks that are provably unreachable. For
instance in generic methods with type specialization (via equality) the jit
will now selectively import just the applicable regions of the method. In some
cases this means the jit may avoid importing EH regions and so remove EH from
a specialized method all together.

Also, generalize the handle compare pattern to look through the handle indirs
that can arise when prejitting.

Running the type opts early also improves jit throughput a little.

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Sep 28, 2017

Part of #10731. Also helps partially clean up the patterns introduced in #14178 in AwaitUnsafeOnCompleted and reduces the number of InterfaceIsCheckWorkaround instantiations created when prejitting.

FYI jit-diffs does not accurately reconcile method count diffs when there are multiple methods with the same name. The after case has fewer methods. It would be nice if the jit dasm would start reporting the class and method instantiation parameters for generic methods to disambiguate these cases.

@dotnet/jit-contrib PTAL
cc @stephentoub

Jit-diffs results:

Total bytes of diff: -2000 (-0.02 % of base)
    diff is an improvement.
Total byte diff includes 0 bytes from reconciling methods
        Base had    0 unique methods,        0 unique bytes   (NOT ACCURATE)
        Diff had    0 unique methods,        0 unique bytes
Top file improvements by size (bytes):
       -1043 : System.Linq.Parallel.dasm (-0.17 % of base)
        -957 : System.Private.CoreLib.dasm (-0.03 % of base)
2 total files with size differences (2 improved, 0 regressed), 77 unchanged.
Top method regessions by size (bytes):
          16 : System.Private.CoreLib.dasm - CLRIPropertyValueImpl:CoerceScalarValue(int,ref):long (2 methods)
          15 : System.Private.CoreLib.dasm - CLRIPropertyValueImpl:CoerceScalarValue(int,ref):int (2 methods)
           8 : System.Private.CoreLib.dasm - CLRIPropertyValueImpl:CoerceScalarValue(int,ref):double
           8 : System.Private.CoreLib.dasm - CLRIPropertyValueImpl:CoerceScalarValue(int,ref):float
           5 : System.Private.CoreLib.dasm - CLRIPropertyValueImpl:CoerceScalarValue(int,ref):char
Top method improvements by size (bytes):
       -1043 : System.Linq.Parallel.dasm - Util:GetDefaultComparer():ref (12 methods)
        -516 : System.Private.CoreLib.dasm - InterfaceIsCheckWorkaround`1:.cctor() (7 methods)
        -486 : System.Private.CoreLib.dasm - AsyncTaskMethodBuilder`1:AwaitUnsafeOnCompleted(byref,byref):this (15 methods)
         -12 : System.Private.CoreLib.dasm - <<ReadAsync>g__FinishReadAsync42_0>d:MoveNext():this
          -2 : System.Private.CoreLib.dasm - AsyncTaskMethodBuilder`1:GetTaskForResult(struct):ref
14 total methods with size differences (7 improved, 7 regressed), 66931 unchanged.

System.Linq.Parallel's UtilGetDefaultComparer (which is heavily optimized by the above) is a candidate for revision. The comment there says:

// Unlike the X86 JIT, null checks on value types aren't optimized away in the X64 JIT compiler.
// That means using the GenericComparer<K> infrastructure results in boxing value
// types. This degrades performance all over the place.

which is no longer accurate.

Code sample from #14177 now shows unreachable EH region is gone:

G_M49492_IG01:
       56                   push     rsi
       4883EC20             sub      rsp, 32
       8BF1                 mov      esi, ecx

G_M49492_IG02:
       E83427A25E           call     System.Globalization.NumberFormatInfo:get_CurrentInfo():ref
       4C8BC0               mov      r8, rax
       8BCE                 mov      ecx, esi
       33D2                 xor      rdx, rdx
       E8C88BA85F           call     System.Number:FormatInt32(int,ref,ref):ref
       488BC8               mov      rcx, rax
       E838FCFFFF           call     System.Console:WriteLine(ref)
       90                   nop

G_M49492_IG03:
       4883C420             add      rsp, 32
       5E                   pop      rsi
       C3                   ret

@pgavlin
Copy link

pgavlin commented Sep 28, 2017

Running the type opts early also improves jit throughput a little.

Out of curiosity, how much is a little?

@AndyAyersMS
Copy link
Member Author

A bit under 1% for System.Private.CoreLib for instructions retired (18.61 billion vs 18.75). Didn't look at anything else since R2R tends to skew things.

@pgavlin
Copy link

pgavlin commented Sep 28, 2017

Nice!

@AndyAyersMS
Copy link
Member Author

Hmm, didn't launch all the CI legs. Let's retry..

@dotnet-bot test this please

@AndyAyersMS
Copy link
Member Author

@dotnet/dnceng I'm guessing you know this but CoreCLR CI currently isn't launching all the test legs.

// If we see indirs, tunnel through to see if there are compile time handles.
if ((op1ClassFromHandle->gtOper == GT_IND) && (op2ClassFromHandle->gtOper == GT_IND))
{
// Indirs should be non-faulting.... verify
Copy link

@briansull briansull Sep 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you want to check for GTF_IND_NONFAULTING flags here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did, yeah. Will add a check.

// Indirs should be non-faulting.... verify
GenTree* op1HandleLiteral = op1ClassFromHandle->gtOp.gtOp1;
GenTree* op2HandleLiteral = op2ClassFromHandle->gtOp.gtOp1;

Copy link

@briansull briansull Sep 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also a mask of bits use to mark handles:
GTF_ICON_HDL_MASK
Both constants should probably have the same bit set.
Probably this one: GTF_ICON_CLASS_HDL

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will check this too.

@AndyAyersMS
Copy link
Member Author

@briansull turns out we were not marking the embedded handle indirs as non-faulting.Good catch.

I also was going to mark them as invariant, but when doing so, CSE got carried away in places. Can look at this separately someday.

Just for the record, here is the jit-diffs for that run...

*** Results from also marking embedded handle indirs as INVARIANT ***
(Note: Lower is better)
Total bytes of diff: 2175 (0.02 % of base)
    diff is a regression.
Total byte diff includes 0 bytes from reconciling methods
        Base had    0 unique methods,        0 unique bytes
        Diff had    0 unique methods,        0 unique bytes
Top file regressions by size (bytes):
        2517 : Microsoft.CodeAnalysis.CSharp.dasm (0.12 % of base)
        2297 : Microsoft.CodeAnalysis.VisualBasic.dasm (0.10 % of base)
         201 : Microsoft.CSharp.dasm (0.06 % of base)
         101 : System.Net.Http.dasm (0.03 % of base)
          48 : System.Net.Security.dasm (0.04 % of base)
Top file improvements by size (bytes):
       -1043 : System.Linq.Parallel.dasm (-0.17 % of base)
        -957 : System.Private.CoreLib.dasm (-0.03 % of base)
        -521 : System.Linq.Expressions.dasm (-0.09 % of base)
        -255 : Microsoft.VisualBasic.dasm (-0.16 % of base)
        -168 : Microsoft.CodeAnalysis.dasm (-0.02 % of base)
32 total files with size differences (18 improved, 14 regressed), 47 unchanged.
Top method regessions by size (bytes):
         430 : Microsoft.CodeAnalysis.CSharp.dasm - SourceMemberContainerTypeSymbol:AddNonTypeMembers(ref,struct,ref):this
         374 : Microsoft.CodeAnalysis.CSharp.dasm - Imports:FromSyntax(ref,ref,ref,bool):ref
         244 : Microsoft.CodeAnalysis.CSharp.dasm - Binder:BindQuery(ref,ref):ref:this
         198 : Microsoft.CodeAnalysis.VisualBasic.dasm - SourceMemberContainerTypeSymbol:MakeExplicitInterfaceImplementationMap(ref):ref:this
         189 : Microsoft.CodeAnalysis.CSharp.dasm - MethodTypeInferrer:Fix(int,byref):bool:this
Top method improvements by size (bytes):
       -1043 : System.Linq.Parallel.dasm - Util:GetDefaultComparer():ref (12 methods)
        -516 : System.Private.CoreLib.dasm - InterfaceIsCheckWorkaround`1:.cctor() (7 methods)
        -486 : System.Private.CoreLib.dasm - AsyncTaskMethodBuilder`1:AwaitUnsafeOnCompleted(byref,byref):this (15 methods)
        -121 : Microsoft.CodeAnalysis.VisualBasic.dasm - SourceNamedTypeSymbol:EarlyDecodeWellKnownAttribute(byref):ref:this
        -102 : System.Linq.Expressions.dasm - MetaDynamic:BuildCallMethodWithResult(ref,ref,ref,ref,ref):ref:this
1577 total methods with size differences (915 improved, 662 regressed), 65368 unchanged.

@redknightlois
Copy link

redknightlois commented Sep 29, 2017

AsyncTaskMethodBuilder`1:AwaitUnsafeOnCompleted(byref,byref):this this method is a general finding profile runs in most of the codebases that I had analyzed recently. +1

@AndyAyersMS
Copy link
Member Author

@redknightlois #14178 was the big change in AwaitUnsafeOnCompleted -- this is just a relatively minor cleanup on top. Here's a sample diff from this change. The prolog gunk in the before version is an artifact of the jit importing all of the IL for the method before it realizes it can optimize the type tests.

;;; BEFORE
; Lcl frame size = 232
G_M30671_IG01:
       push     rdi
       push     rsi
       sub      rsp, 232
       mov      rsi, rcx
       lea      rdi, [rsp+C8H]
       mov      ecx, 8
       xor      rax, rax
       rep stosd 
       mov      rcx, rsi
       mov      rsi, rdx
G_M30671_IG02:
       mov      rdx, r8
       call     AsyncTaskMethodBuilder`1:GetStateMachineBox(byref):ref:this
       mov      rdx, rax
       movzx    r8, byte  ptr [rsi+8]
       mov      rcx, gword ptr [rsi]
       call     TaskAwaiter:UnsafeOnCompletedInternal(ref,ref,bool)
G_M30671_IG03:
       nop      
G_M30671_IG04:
       add      rsp, 232
       pop      rsi
       pop      rdi
       ret

;;; AFTER
; Lcl frame size = 32
G_M30613_IG01:
       push     rsi
       sub      rsp, 32
       mov      rsi, rdx
G_M30613_IG02:
       mov      rdx, r8
       call     AsyncTaskMethodBuilder`1:GetStateMachineBox(byref):ref:this
       mov      rdx, rax
       movzx    r8, byte  ptr [rsi+8]
       mov      rcx, gword ptr [rsi]
       call     TaskAwaiter:UnsafeOnCompletedInternal(ref,ref,bool)
       nop      
G_M30613_IG03:
       add      rsp, 32
       pop      rsi
       ret 

@AndyAyersMS
Copy link
Member Author

@briansull this is ready to go, can you take another look?

Copy link

@briansull briansull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good with a few comments

// We must have a handle on one side or the other here to optimize,
// otherwise we can't be sure that optimizing is sound.
const bool op1IsFromHandle = op1Kind == TPK_Handle;
const bool op2IsFromHandle = op2Kind == TPK_Handle;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically all comparisons are enclosed in parenthesis.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks.


#ifdef LEGACY_BACKEND
// Fetch object method table from the object itself
GenTree* const objMT = gtNewOperNode(GT_IND, TYP_I_IMPL, opOther->gtCall.gtCallObjp);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised to see an indef LEGACY_BACKEND here. I would have expected that we didn't do the optimization with legacy backend.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes the LEGACY_BACKEND path different here anyway?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a holdover from how the code was before, when it was in morph.

Not sure how things got to the state they were just before my changes, but here's a guess.

Some time in the past, calls and intrinsics were done differently for the legacy backend and this optimization was supported (or if not, somebody went to a lot of trouble to make it look like it was).

Then at some point, somebody unified the way calls and intriniscs were handled, but did not update this optimization. So the pattern matches that guard this optimization stopped matching for the legacy backend, and the transformation part specific to the legacy backend went stale.

Now that the importer paths are unified it would be simplest to just remove the ifdef'd code and have this entire optimization be common code.

But I can also just disable this opt for the legacy backend.

@@ -15099,25 +15392,24 @@ void Compiler::gtCheckQuirkAddrExposedLclVar(GenTreePtr tree, GenTreeStack* pare
}

//------------------------------------------------------------------------
// gtCanOptimizeTypeEquality: check if tree is a suitable input for a type
// gtCanOptimizeTypeEquality: check if tree is a suitable input for type

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to rename this method as "Can" typically implies that it returns a bool. Maybe gtOptimizableTypeEqualityKind() instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about gtGetTypeProducerKind ?

// 2. typeof(...) == obj.GetType()
return fgMorphTree(compare);
}
return fgMorphTree(optTree);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently Github is saying that you have a merge conflict to resolve in morph.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, will need to rebase and fix.

Refactor the type equality optimizations embedded in morph so that they can
also be invoked earlier, in particular, during importation. This allows the
importer to short-circut importing blocks that are provably unreachable. For
instance in generic methods with type specialization (via equality) the jit
will now selectively import just the applicable regions of the method. In some
cases this means the jit may avoid importing EH regions and so remove EH from
a specialized method all together.

Also, generalize the handle compare pattern to look through the handle indirs
that can arise when prejitting.

Running the type opts early also improves jit throughput a little.
Mark the embedded handle indirs as nonfaulting. Would also like to mark
them as invariant, but CSE gets overly enthusiastic and we see bad diffs.

Also, spruce up a few comments.
`gtOptimizeEnumHasFlag` returns null on failure, not the original tree.
So check for this.

A `refanytype` creates runtime handle to type helper call with an indir
off of the refany, this isn't marked as nonfaulting. Perhaps it should
be, but even if so we wouldn't optimize further. So check for faulting
indirs instead of asserting.

Also I am pretty sure the helper properties for the handle to type helpers
could be improved: the MAYBENULL helper isn't even in the table and the
normal will never return null. But will defer changing this for now.
Removed legacy special-casing, verified legacy path works fine with
the intrinsics and opts fully enabled.
@AndyAyersMS
Copy link
Member Author

@briansull I removed all the legacy special-casing since it doesn't seem to be needed anymore.

@AndyAyersMS
Copy link
Member Author

Ah, well there's the reason, probably Type.GetType() is not supported as an intrinsic in legacy. I'll go back and exclude this as a recognized legacy intrinsic.

The OSX failure is a CI issue:
'JNLP4-connect connection from 131.107.19.213/131.107.19.213:59045' is disconnected.

Copy link

@briansull briansull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks Good

@AndyAyersMS
Copy link
Member Author

More random-seeming CI issues in this leg. Will retry.

11:04:55 D:\j\w\perf_perflab_---1c2b0529>powershell wget https://dist.nuget.org/win-x86-commandline/latest/nuget.exe -OutFile "D:\j\w\perf_perflab_---1c2b0529\nuget.exe" 
11:04:57 wget : Unable to read data from the transport connection: An existing 
11:04:57 connection was forcibly closed by the remote host.
11:04:57 At line:1 char:1
11:04:57 + wget https://dist.nuget.org/win-x86-commandline/latest/nuget.exe -OutFile 

@dotnet-bot retest Windows_NT x86 full_opt ryujit CoreCLR Perf Tests Correctness

@AndyAyersMS
Copy link
Member Author

@dotnet/dnceng OSX legs seem particularly flaky lately:

image

@AndyAyersMS
Copy link
Member Author

@dotnet-bot restest OSX10.12 x64 Checked Build and Test
Guessing this won't help

@AndyAyersMS AndyAyersMS merged commit aadf2b0 into dotnet:master Oct 3, 2017
@AndyAyersMS AndyAyersMS deleted the EarlyTypeOpt branch October 3, 2017 20:29
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants