-
Notifications
You must be signed in to change notification settings - Fork 2.7k
JIT: run type equality optimizations earlier #14244
Conversation
Part of #10731. Also helps partially clean up the patterns introduced in #14178 in FYI jit-diffs does not accurately reconcile method count diffs when there are multiple methods with the same name. The after case has fewer methods. It would be nice if the jit dasm would start reporting the class and method instantiation parameters for generic methods to disambiguate these cases. @dotnet/jit-contrib PTAL Jit-diffs results:
// Unlike the X86 JIT, null checks on value types aren't optimized away in the X64 JIT compiler.
// That means using the GenericComparer<K> infrastructure results in boxing value
// types. This degrades performance all over the place. which is no longer accurate. Code sample from #14177 now shows unreachable EH region is gone: G_M49492_IG01:
56 push rsi
4883EC20 sub rsp, 32
8BF1 mov esi, ecx
G_M49492_IG02:
E83427A25E call System.Globalization.NumberFormatInfo:get_CurrentInfo():ref
4C8BC0 mov r8, rax
8BCE mov ecx, esi
33D2 xor rdx, rdx
E8C88BA85F call System.Number:FormatInt32(int,ref,ref):ref
488BC8 mov rcx, rax
E838FCFFFF call System.Console:WriteLine(ref)
90 nop
G_M49492_IG03:
4883C420 add rsp, 32
5E pop rsi
C3 ret |
Out of curiosity, how much is a little? |
A bit under 1% for System.Private.CoreLib for instructions retired (18.61 billion vs 18.75). Didn't look at anything else since R2R tends to skew things. |
Nice! |
Hmm, didn't launch all the CI legs. Let's retry.. @dotnet-bot test this please |
@dotnet/dnceng I'm guessing you know this but CoreCLR CI currently isn't launching all the test legs. |
src/jit/gentree.cpp
Outdated
// If we see indirs, tunnel through to see if there are compile time handles. | ||
if ((op1ClassFromHandle->gtOper == GT_IND) && (op2ClassFromHandle->gtOper == GT_IND)) | ||
{ | ||
// Indirs should be non-faulting.... verify |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you want to check for GTF_IND_NONFAULTING
flags here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did, yeah. Will add a check.
src/jit/gentree.cpp
Outdated
// Indirs should be non-faulting.... verify | ||
GenTree* op1HandleLiteral = op1ClassFromHandle->gtOp.gtOp1; | ||
GenTree* op2HandleLiteral = op2ClassFromHandle->gtOp.gtOp1; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also a mask of bits use to mark handles:
GTF_ICON_HDL_MASK
Both constants should probably have the same bit set.
Probably this one: GTF_ICON_CLASS_HDL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will check this too.
@briansull turns out we were not marking the embedded handle indirs as non-faulting.Good catch. I also was going to mark them as invariant, but when doing so, CSE got carried away in places. Can look at this separately someday. Just for the record, here is the jit-diffs for that run...
|
|
@redknightlois #14178 was the big change in ;;; BEFORE
; Lcl frame size = 232
G_M30671_IG01:
push rdi
push rsi
sub rsp, 232
mov rsi, rcx
lea rdi, [rsp+C8H]
mov ecx, 8
xor rax, rax
rep stosd
mov rcx, rsi
mov rsi, rdx
G_M30671_IG02:
mov rdx, r8
call AsyncTaskMethodBuilder`1:GetStateMachineBox(byref):ref:this
mov rdx, rax
movzx r8, byte ptr [rsi+8]
mov rcx, gword ptr [rsi]
call TaskAwaiter:UnsafeOnCompletedInternal(ref,ref,bool)
G_M30671_IG03:
nop
G_M30671_IG04:
add rsp, 232
pop rsi
pop rdi
ret
;;; AFTER
; Lcl frame size = 32
G_M30613_IG01:
push rsi
sub rsp, 32
mov rsi, rdx
G_M30613_IG02:
mov rdx, r8
call AsyncTaskMethodBuilder`1:GetStateMachineBox(byref):ref:this
mov rdx, rax
movzx r8, byte ptr [rsi+8]
mov rcx, gword ptr [rsi]
call TaskAwaiter:UnsafeOnCompletedInternal(ref,ref,bool)
nop
G_M30613_IG03:
add rsp, 32
pop rsi
ret |
@briansull this is ready to go, can you take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good with a few comments
src/jit/gentree.cpp
Outdated
// We must have a handle on one side or the other here to optimize, | ||
// otherwise we can't be sure that optimizing is sound. | ||
const bool op1IsFromHandle = op1Kind == TPK_Handle; | ||
const bool op2IsFromHandle = op2Kind == TPK_Handle; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically all comparisons are enclosed in parenthesis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks.
src/jit/gentree.cpp
Outdated
|
||
#ifdef LEGACY_BACKEND | ||
// Fetch object method table from the object itself | ||
GenTree* const objMT = gtNewOperNode(GT_IND, TYP_I_IMPL, opOther->gtCall.gtCallObjp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am surprised to see an indef LEGACY_BACKEND here. I would have expected that we didn't do the optimization with legacy backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What makes the LEGACY_BACKEND path different here anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a holdover from how the code was before, when it was in morph.
Not sure how things got to the state they were just before my changes, but here's a guess.
Some time in the past, calls and intrinsics were done differently for the legacy backend and this optimization was supported (or if not, somebody went to a lot of trouble to make it look like it was).
Then at some point, somebody unified the way calls and intriniscs were handled, but did not update this optimization. So the pattern matches that guard this optimization stopped matching for the legacy backend, and the transformation part specific to the legacy backend went stale.
Now that the importer paths are unified it would be simplest to just remove the ifdef'd code and have this entire optimization be common code.
But I can also just disable this opt for the legacy backend.
src/jit/gentree.cpp
Outdated
@@ -15099,25 +15392,24 @@ void Compiler::gtCheckQuirkAddrExposedLclVar(GenTreePtr tree, GenTreeStack* pare | |||
} | |||
|
|||
//------------------------------------------------------------------------ | |||
// gtCanOptimizeTypeEquality: check if tree is a suitable input for a type | |||
// gtCanOptimizeTypeEquality: check if tree is a suitable input for type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to rename this method as "Can" typically implies that it returns a bool. Maybe gtOptimizableTypeEqualityKind() instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about gtGetTypeProducerKind
?
// 2. typeof(...) == obj.GetType() | ||
return fgMorphTree(compare); | ||
} | ||
return fgMorphTree(optTree); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently Github is saying that you have a merge conflict to resolve in morph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, will need to rebase and fix.
Refactor the type equality optimizations embedded in morph so that they can also be invoked earlier, in particular, during importation. This allows the importer to short-circut importing blocks that are provably unreachable. For instance in generic methods with type specialization (via equality) the jit will now selectively import just the applicable regions of the method. In some cases this means the jit may avoid importing EH regions and so remove EH from a specialized method all together. Also, generalize the handle compare pattern to look through the handle indirs that can arise when prejitting. Running the type opts early also improves jit throughput a little.
Mark the embedded handle indirs as nonfaulting. Would also like to mark them as invariant, but CSE gets overly enthusiastic and we see bad diffs. Also, spruce up a few comments.
`gtOptimizeEnumHasFlag` returns null on failure, not the original tree. So check for this. A `refanytype` creates runtime handle to type helper call with an indir off of the refany, this isn't marked as nonfaulting. Perhaps it should be, but even if so we wouldn't optimize further. So check for faulting indirs instead of asserting. Also I am pretty sure the helper properties for the handle to type helpers could be improved: the MAYBENULL helper isn't even in the table and the normal will never return null. But will defer changing this for now.
Removed legacy special-casing, verified legacy path works fine with the intrinsics and opts fully enabled.
5feb7cd
to
f4b39a2
Compare
@briansull I removed all the legacy special-casing since it doesn't seem to be needed anymore. |
Ah, well there's the reason, probably Type.GetType() is not supported as an intrinsic in legacy. I'll go back and exclude this as a recognized legacy intrinsic. The OSX failure is a CI issue: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks Good
More random-seeming CI issues in this leg. Will retry.
@dotnet-bot retest Windows_NT x86 full_opt ryujit CoreCLR Perf Tests Correctness |
@dotnet-bot restest OSX10.12 x64 Checked Build and Test |
Refactor the type equality optimizations embedded in morph so that they can
also be invoked earlier, in particular, during importation. This allows the
importer to short-circut importing blocks that are provably unreachable. For
instance in generic methods with type specialization (via equality) the jit
will now selectively import just the applicable regions of the method. In some
cases this means the jit may avoid importing EH regions and so remove EH from
a specialized method all together.
Also, generalize the handle compare pattern to look through the handle indirs
that can arise when prejitting.
Running the type opts early also improves jit throughput a little.