Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Optimize type casts #14420

Merged
merged 3 commits into from
Oct 13, 2017
Merged

Optimize type casts #14420

merged 3 commits into from
Oct 13, 2017

Conversation

AndyAyersMS
Copy link
Member

Work in progress. PR to get some broader testing.

@AndyAyersMS AndyAyersMS added the * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) label Oct 11, 2017
@AndyAyersMS
Copy link
Member Author

@dotnet-bot test Windows_NT x64 corefx_baseline

{
ref TaskAwaiter ta = ref Unsafe.As<TAwaiter, TaskAwaiter>(ref awaiter); // relies on TaskAwaiter/TaskAwaiter<T> having the same layout
ref TaskAwaiter ta = ref Unsafe.As<TAwaiter, TaskAwaiter>(ref awaiter);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably still needs the comment ?

 // relies on TaskAwaiter/TaskAwaiter<T> having the same layout

As warning/info on the Unsafe.As cast?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restored.

typeof(TAwaiter) == typeof(TaskAwaiter<byte>) ||
typeof(TAwaiter) == typeof(TaskAwaiter<int>) ||
typeof(TAwaiter) == typeof(TaskAwaiter<long>))
if (awaiter is ITaskAwaiter)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WOO HOO!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up wrapping these new is tests in a value class guard, otherwise cases where the arguments were byref-to-ref classes still ran through these interface tests -- take a look and make sure the new logic is reasonable.

ref TaskAwaiter ta = ref Unsafe.As<TAwaiter, TaskAwaiter>(ref awaiter);
TaskAwaiter.UnsafeOnCompletedInternal(ta.m_task, box, continueOnCapturedContext: true);
}
else if (InterfaceIsCheckWorkaround<TAwaiter>.IsIConfiguredTaskAwaiter)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The InterfaceIsCheckWorkaround class could then also be deleted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's gone now.

TypeHandle fromHnd = (TypeHandle) fromClass;
TypeHandle toHnd = (TypeHandle) toClass;

#ifdef FEATURE_COMINTEROP

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: => Don't try to optimize

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix, thanks.

else
#endif // FEATURE_COMINTEROP

// If the types are not shared, we can check directly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the types are not shared => if we don't have shared generic types

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also might want to introduce two bools here:
bool toTypeIsShared = toType.IsCanonicalSubtype()
bool fromTypeIsShared = ...


JIT_TO_EE_TRANSITION();

TypeHandle fromHnd = (TypeHandle) fromClass;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to rename:

fromClass => fromHnd
toClass => toHnd

And

fromHnd => fromType
toHnd => toType

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As handle implies that we are treating the local as an opaque thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime code doesn't seem super-consistent here; I'll see if Jan has a preference.

@AndyAyersMS
Copy link
Member Author

Preliminary jit-diff impact below. Unlike many changes this change alters what methods are prejitted, as a number of value type instantiations are no longer needed. Hence the "reconciling methods" impact is fairly large. Beyond that there is still another ~13K reduction in code size in 300 methods that are prejitted in both cases. Still looking at diffs and there are some that need more scrutiny.

Total bytes of diff: -35438 (-0.26 % of base)
    diff is an improvement.

Total byte diff includes -22824 bytes from reconciling methods
        Base had  209 unique methods,    22824 unique bytes
        Diff had    0 unique methods,        0 unique bytes

Top file improvements by size (bytes):
      -32922 : System.Private.CoreLib.dasm (-0.93 % of base)
       -1226 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.05 % of base)
       -1114 : Microsoft.CSharp.dasm (-0.36 % of base)
        -151 : Microsoft.CodeAnalysis.CSharp.dasm (-0.01 % of base)
         -18 : System.Linq.Expressions.dasm (0.00 % of base)

6 total files with size differences (6 improved, 0 regressed), 73 unchanged.

Top method regessions by size (bytes):
          31 : Microsoft.CodeAnalysis.VisualBasic.dasm - Microsoft.CodeAnalysis.VisualBasic.Syntax.InternalSyntax.Parser:ParseParameterSpecifiers(byref):struct:this
           4 : System.Private.CoreLib.dasm - System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[Int32][System.Int32]:AwaitUnsafeOnCompleted(byref,byref):this

Top method improvements by size (bytes):
        -840 : System.Private.CoreLib.dasm - System.Threading.Tasks.TaskFactory`1[Byte][System.Byte]:FromAsyncImpl(ref,ref,ref,int,ref):ref
        -840 : System.Private.CoreLib.dasm - System.Threading.Tasks.TaskFactory`1[Int64][System.Int64]:FromAsyncImpl(ref,ref,ref,int,ref):ref
        -808 : System.Private.CoreLib.dasm - System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[__Canon][System.__Canon]:AwaitUnsafeOnCompleted(byref,byref):this (3 methods)
        -724 : System.Private.CoreLib.dasm - System.Threading.Tasks.TaskFactory`1[Byte][System.Byte]:FromAsyncImpl(ref,ref,ref,ref,int):ref
        -724 : System.Private.CoreLib.dasm - System.Threading.Tasks.TaskFactory`1[Int64][System.Int64]:FromAsyncImpl(ref,ref,ref,ref,int):ref

504 total methods with size differences (502 improved, 2 regressed), 78910 unchanged.

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Oct 11, 2017

Some examples of diffs (will edit/update as I catalog these)

If we're boxing a byref, we can clean up most of the box, but we leave what is effectively a null check, which is often unnecessary as the byref is dereferenced shortly thereafter. So we end up with

-; Total bytes of code 39, prolog size 5 for method System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[Boolean][System.Boolean]:AwaitUnsafeOnCompleted(byref,byref):this
+; Total bytes of code 43, prolog size 5 for method System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[Boolean][System.Boolean]:AwaitUnsafeOnCompleted(byref,byref):this
-; Lcl frame size = 32
+; Lcl frame size = 48
 G_M10573_IG01:
        push     rsi
-       sub      rsp, 32
+       sub      rsp, 48
        mov      rsi, rdx
 G_M10573_IG02:
        mov      rdx, r8
        call     System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[Boolean][System.Boolean]:GetStateMachineBox(byref):ref:this
        mov      rdx, rax
+       movsx    r8, byte  ptr [rsi]
        movzx    r8, byte  ptr [rsi+8]
        mov      rcx, gword ptr [rsi]
        call     System.Runtime.CompilerServices.TaskAwaiter:UnsafeOnCompletedInternal(ref,ref,bool)
        nop
 G_M10573_IG03:
-       add      rsp, 32
+       add      rsp, 48
        pop      rsi
        ret

In the before case we were using type equality to specialize and so did not introduce a box with an early dereference of the byref. Now we have one and can't get rid of it. Think we'll have to live with this one for now.

Odd though that the frame size is larger; nothing is touching the extra frame space. I'll see if I can track this part down.

@AndyAyersMS
Copy link
Member Author

Here's a good diff example, after version is basically the same as the after version above (with the same extra null check and a new wayward store of rdx into the stack) but the frame is even larger:

-; Total bytes of code 427, prolog size 19 for method System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[Boolean][System.Boolean]:AwaitUnsafeOnCompleted(byref,byref):this
+; Total bytes of code 48, prolog size 10 for method System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[Boolean][System.Boolean]:AwaitUnsafeOnCompleted(byref,byref):this
-; Lcl frame size = 168
-G_M10575_IG01:
-       push     rdi
+; Lcl frame size = 96
+G_M10573_IG01:
        push     rsi
-       push     rbp
-       push     rbx
-       sub      rsp, 168
-       mov      qword ptr [rsp+A0H], rdx
-       mov      rsi, rdx
-       mov      rdi, r8
-G_M10575_IG02:
+       sub      rsp, 96
+       mov      qword ptr [rsp+58H], rdx
+       mov      rsi, r8
+G_M10573_IG02:
        mov      rdx, r9
        call     System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[Boolean][System.Boolean]:GetStateMachineBox(byref):ref:this
-       mov      rbx, rax
-       mov      rbp, qword ptr [rsi+56]
-       mov      r8, qword ptr [rbp]
-       mov      rcx, r8
-       and      ecx, 1
-       mov      rdx, r8
-       test     ecx, ecx
-       je       SHORT G_M10575_IG03
-       mov      rdx, qword ptr [rdx-1]
-G_M10575_IG03:
-       lea      rax, [(reloc)]
-       cmp      rdx, rax
-       je       G_M10575_IG10
-       mov      rdx, r8
-       test     ecx, ecx
-       je       SHORT G_M10575_IG04
-       mov      rdx, qword ptr [rdx-1]
-G_M10575_IG04:
-       lea      rax, [(reloc)]
-       cmp      rdx, rax
-       je       SHORT G_M10575_IG10
-       mov      rdx, r8
-       test     ecx, ecx
-       je       SHORT G_M10575_IG05
-       mov      rdx, qword ptr [rdx-1]
-G_M10575_IG05:
-       lea      rax, [(reloc)]
-       cmp      rdx, rax
-       je       SHORT G_M10575_IG10
-       mov      rdx, r8
-       test     ecx, ecx
-       je       SHORT G_M10575_IG06
-       mov      rdx, qword ptr [rdx-1]
-G_M10575_IG06:
-       lea      rax, [(reloc)]
-       cmp      rdx, rax
-       je       SHORT G_M10575_IG10
-       mov      rdx, r8
-       test     ecx, ecx
-       je       SHORT G_M10575_IG07
-       mov      rdx, qword ptr [rdx-1]
-G_M10575_IG07:
-       lea      rax, [(reloc)]
-       cmp      rdx, rax
-       je       SHORT G_M10575_IG10
-       mov      rdx, r8
-       test     ecx, ecx
-       je       SHORT G_M10575_IG08
-       mov      rdx, qword ptr [rdx-1]
-G_M10575_IG08:
-       lea      rax, [(reloc)]
-       cmp      rdx, rax
-       je       SHORT G_M10575_IG10
-       test     ecx, ecx
-       je       SHORT G_M10575_IG09
-       mov      r8, qword ptr [r8-1]
-G_M10575_IG09:
-       lea      rcx, [(reloc)]
-       cmp      r8, rcx
-       jne      SHORT G_M10575_IG12
-G_M10575_IG10:
-       movzx    r8, byte  ptr [rdi+8]
-       mov      rcx, gword ptr [rdi]
-       mov      rdx, rbx
+       mov      rdx, rax
+       movsx    r8, byte  ptr [rsi]
+       movzx    r8, byte  ptr [rsi+8]
+       mov      rcx, gword ptr [rsi]
        call     System.Runtime.CompilerServices.TaskAwaiter:UnsafeOnCompletedInternal(ref,ref,bool)
        nop      
-G_M10575_IG11:
-       add      rsp, 168
-       pop      rbx
-       pop      rbp
+G_M10573_IG03:
+       add      rsp, 96
        pop      rsi
-       pop      rdi
        ret      
-G_M10575_IG12:
-       mov      rcx, qword ptr [rbp+24]
-       test     rcx, rcx
-       jne      SHORT G_M10575_IG13
-       mov      rcx, rsi
-       lea      rdx, [(reloc)]
-       call     CORINFO_HELP_RUNTIMEHANDLE_METHOD
-       mov      rcx, rax
-G_M10575_IG13:
-       call     CORINFO_HELP_GETGENERICS_NONGCSTATIC_BASE
-       cmp      byte  ptr [rax+8], 0
-       je       SHORT G_M10575_IG15
-       mov      rcx, gword ptr [rdi]
-       mov      rdx, rbx
-       mov      r8d, 1
-       call     System.Runtime.CompilerServices.TaskAwaiter:UnsafeOnCompletedInternal(ref,ref,bool)
-       nop      
-G_M10575_IG14:
-       add      rsp, 168
-       pop      rbx
-       pop      rbp
-       pop      rsi
-       pop      rdi
-       ret      
-G_M10575_IG15:
-       mov      rcx, qword ptr [rbp+24]
-       test     rcx, rcx
-       jne      SHORT G_M10575_IG16
-       mov      rcx, rsi
-       lea      rdx, [(reloc)]
-       call     CORINFO_HELP_RUNTIMEHANDLE_METHOD
-       mov      rcx, rax
-G_M10575_IG16:
-       call     CORINFO_HELP_GETGENERICS_NONGCSTATIC_BASE
-       cmp      byte  ptr [rax+9], 0
-       je       SHORT G_M10575_IG18
-       movzx    r8, byte  ptr [rdi+8]
-       mov      rcx, gword ptr [rdi]
-       mov      rdx, rbx
-       call     System.Runtime.CompilerServices.TaskAwaiter:UnsafeOnCompletedInternal(ref,ref,bool)
-       nop      
-G_M10575_IG17:
-       add      rsp, 168
-       pop      rbx
-       pop      rbp
-       pop      rsi
-       pop      rdi
-       ret      
-G_M10575_IG18:
-       mov      rcx, qword ptr [rbp+32]
-       test     rcx, rcx
-       jne      SHORT G_M10575_IG19
-       mov      rcx, rsi
-       lea      rdx, [(reloc)]
-       call     CORINFO_HELP_RUNTIMEHANDLE_METHOD
-       mov      rcx, rax
-G_M10575_IG19:
-       mov      rdx, rdi
-       mov      r8, rbx
-       call     System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[Boolean][System.Boolean]:AwaitArbitraryAwaiterUnsafeOnCompleted(byref,ref)
-       nop      
-G_M10575_IG20:
-       add      rsp, 168
-       pop      rbx
-       pop      rbp
-       pop      rsi
-       pop      rdi
-       ret      

@benaadams
Copy link
Member

Calculates frame size on pre-optimized code? https://github.com/dotnet/coreclr/issues/9066

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Oct 11, 2017

Probably a bigger issue is that the jit does not track types for byrefs to reference types, so we end up doing interface checks where the old code could do equality checks.

-; Total bytes of code 226, prolog size 19 for method System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[VoidTaskResult][System.Threading.Tasks.VoidTaskResult]:AwaitUnsafeOnCompleted(byref,byref):this
-; Lcl frame size = 248
+; Total bytes of code 178, prolog size 13 for method System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[VoidTaskResult][System.Threading.Tasks.VoidTaskResult]:AwaitUnsafeOnCompleted(byref,byref):this
+; Lcl frame size = 120
 G_M63121_IG01:
        push     rdi
        push     rsi
        push     rbp
        push     rbx
-       sub      rsp, 248
-       mov      qword ptr [rsp+F0H], rdx
+       sub      rsp, 120
+       mov      qword ptr [rsp+70H], rdx
        mov      rsi, rdx
        mov      rdi, r8
 G_M63121_IG02:
        mov      rdx, r9
        call     System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[VoidTaskResult][System.Threading.Tasks.VoidTaskResult]:GetStateMachineBox(byref):ref:this
        mov      rbx, rax
-       mov      rbp, qword ptr [rsi+56]
-       mov      rcx, qword ptr [rbp+16]
-       test     rcx, rcx
-       jne      SHORT G_M63121_IG03
-       mov      rcx, rsi
-       lea      rdx, [(reloc)]
-       call     CORINFO_HELP_RUNTIMEHANDLE_METHOD
-       mov      rcx, rax
-G_M63121_IG03:
-       call     CORINFO_HELP_GETGENERICS_NONGCSTATIC_BASE
-       cmp      byte  ptr [rax+8], 0
-       je       SHORT G_M63121_IG05
-       mov      rcx, gword ptr [rdi]
+       mov      rbp, gword ptr [rdi]
+       mov      rdx, rbp
+       lea      rcx, [(reloc)]
+       call     CORINFO_HELP_ISINSTANCEOFINTERFACE
+       test     rax, rax
+       je       SHORT G_M63121_IG04
+       mov      rcx, rbp
        mov      rdx, rbx
        mov      r8d, 1
        call     System.Runtime.CompilerServices.TaskAwaiter:UnsafeOnCompletedInternal(ref,ref,bool)
        nop      
-G_M63121_IG04:
-       add      rsp, 248
+G_M63121_IG03:
+       add      rsp, 120
        pop      rbx
        pop      rbp
        pop      rsi
        pop      rdi
        ret      
-G_M63121_IG05:
-       mov      rcx, qword ptr [rbp+16]
-       test     rcx, rcx
-       jne      SHORT G_M63121_IG06
-       mov      rcx, rsi
-       lea      rdx, [(reloc)]
-       call     CORINFO_HELP_RUNTIMEHANDLE_METHOD
-       mov      rcx, rax
-G_M63121_IG06:
-       call     CORINFO_HELP_GETGENERICS_NONGCSTATIC_BASE
-       cmp      byte  ptr [rax+9], 0
-       je       SHORT G_M63121_IG08
+G_M63121_IG04:
+       mov      rdx, rbp
+       lea      rcx, [(reloc)]
+       call     CORINFO_HELP_ISINSTANCEOFINTERFACE
+       test     rax, rax
+       je       SHORT G_M63121_IG06
        movzx    r8, byte  ptr [rdi+8]
-       mov      rcx, gword ptr [rdi]
+       mov      rcx, rbp
        mov      rdx, rbx
        call     System.Runtime.CompilerServices.TaskAwaiter:UnsafeOnCompletedInternal(ref,ref,bool)
        nop      
-G_M63121_IG07:
-       add      rsp, 248
+G_M63121_IG05:
+       add      rsp, 120
        pop      rbx
        pop      rbp
        pop      rsi
        pop      rdi
        ret      
-G_M63121_IG08:
-       mov      rcx, qword ptr [rbp+24]
+G_M63121_IG06:
+       mov      rcx, qword ptr [rsi+56]
+       mov      rcx, qword ptr [rcx+16]
        test     rcx, rcx
-       jne      SHORT G_M63121_IG09
+       jne      SHORT G_M63121_IG07
        mov      rcx, rsi
        lea      rdx, [(reloc)]
        call     CORINFO_HELP_RUNTIMEHANDLE_METHOD
        mov      rcx, rax
-G_M63121_IG09:
+G_M63121_IG07:
        mov      rdx, rdi
        mov      r8, rbx
        call     System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[VoidTaskResult][System.Threading.Tasks.VoidTaskResult]:AwaitArbitraryAwaiterUnsafeOnCompleted(byref,ref)
        nop      
-G_M63121_IG10:
-       add      rsp, 248
+G_M63121_IG08:
+       add      rsp, 120
        pop      rbx
        pop      rbp
        pop      rsi
        pop      rdi
        ret     

I think it is should not be that hard to extend type tracking to byrefs of reference types, at least for locals and args and temps, and hopefully end up in a better place. We already have a "byref" bit in the locals table and room there for a class handle. But that is perhaps best done as a prerequisite change.

[Edit] in this case we have a byref to canon, so there's no value add in tracking that, as we will not be able to resolve the casts. Instead we can guard the interface checks so they only apply to value classes.

@AndyAyersMS
Copy link
Member Author

Think this is getting close, the extra "value class" guard in AwaitUnsafeOnCompleted has removed the regressions from the cases where the arguments are byrefs to ref classes.

Jit-diffs now shows:

Total bytes of diff: -35609 (-0.27 % of base)
    diff is an improvement.

Total byte diff includes -22978 bytes from reconciling methods
        Base had  210 unique methods,    22978 unique bytes
        Diff had    0 unique methods,        0 unique bytes

Top file improvements by size (bytes):
      -33093 : System.Private.CoreLib.dasm (-0.93 % of base)
       -1226 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.05 % of base)
       -1114 : Microsoft.CSharp.dasm (-0.36 % of base)
        -151 : Microsoft.CodeAnalysis.CSharp.dasm (-0.01 % of base)
         -18 : System.Linq.Expressions.dasm (0.00 % of base)

6 total files with size differences (6 improved, 0 regressed), 73 unchanged.

Top method regessions by size (bytes):
         281 : System.Private.CoreLib.dasm - System.Runtime.CompilerServices.AsyncValueTaskMethodBuilder`1[Int32][System.Int32]:SetResult(int):this
          31 : Microsoft.CodeAnalysis.VisualBasic.dasm - Microsoft.CodeAnalysis.VisualBasic.Syntax.InternalSyntax.Parser:ParseParameterSpecifiers(byref):struct:this
           4 : System.Private.CoreLib.dasm - System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[Int32][System.Int32]:AwaitUnsafeOnCompleted(byref,byref):this

Top method improvements by size (bytes):
        -908 : System.Private.CoreLib.dasm - System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[__Canon][System.__Canon]:AwaitUnsafeOnCompleted(byref,byref):this (3 methods)
        -840 : System.Private.CoreLib.dasm - System.Threading.Tasks.TaskFactory`1[Byte][System.Byte]:FromAsyncImpl(ref,ref,ref,int,ref):ref
        -840 : System.Private.CoreLib.dasm - System.Threading.Tasks.TaskFactory`1[Int64][System.Int64]:FromAsyncImpl(ref,ref,ref,int,ref):ref
        -724 : System.Private.CoreLib.dasm - System.Threading.Tasks.TaskFactory`1[Byte][System.Byte]:FromAsyncImpl(ref,ref,ref,ref,int):ref
        -724 : System.Private.CoreLib.dasm - System.Threading.Tasks.TaskFactory`1[Int64][System.Int64]:FromAsyncImpl(ref,ref,ref,ref,int):ref

506 total methods with size differences (503 improved, 3 regressed), 78907 unchanged.

The big regression in SetResult(int):this is the result of an ordering change during crossgen. In the before case we compile a callee of this method first, and determine that the callee was never inlinable; in the after case we compile the caller first and attempt to inline the callee, and succeed. This inline drags in some aggressive inlines and so we see a big jump in code size.

This points out a shortcoming in the logic used by the "never inline" classifier in prejitting -- it should be setting the bar high enough that no matter what order the methods are seen when prejitting, the classifier will not end up blocking any inlines -- it should just help the jit reject failing inlines faster. Clearly we're not there yet, but this is the first time I've run across this. Tracking it via #14441.

@AndyAyersMS
Copy link
Member Author

Bad example from above fixed with latest changes:

-; Total bytes of code 226, prolog size 19 for method System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[VoidTaskResult][System.Threading.Tasks.VoidTaskResult]:AwaitUnsafeOnCompleted(byref,byref):this
-; Lcl frame size = 248
+; Total bytes of code 80, prolog size 12 for method System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[VoidTaskResult][System.Threading.Tasks.VoidTaskResult]:AwaitUnsafeOnCompleted(byref,byref):this
+; Lcl frame size = 96
 G_M63121_IG01:
        push     rdi
        push     rsi
-       push     rbp
        push     rbx
-       sub      rsp, 248
-       mov      qword ptr [rsp+F0H], rdx
+       sub      rsp, 96
+       mov      qword ptr [rsp+58H], rdx
        mov      rsi, rdx
        mov      rdi, r8
 G_M63121_IG02:
        mov      rdx, r9
        call     System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[VoidTaskResult][System.Threading.Tasks.VoidTaskResult]:GetStateMachineBox(byref):ref:this
        mov      rbx, rax
-       mov      rbp, qword ptr [rsi+56]
-       mov      rcx, qword ptr [rbp+16]
+       mov      rcx, qword ptr [rsi+56]
+       mov      rcx, qword ptr [rcx+16]
        test     rcx, rcx
        jne      SHORT G_M63121_IG03
        mov      rcx, rsi
@@ -22,62 +21,14 @@ G_M63121_IG02:
        call     CORINFO_HELP_RUNTIMEHANDLE_METHOD
        mov      rcx, rax
 G_M63121_IG03:
-       call     CORINFO_HELP_GETGENERICS_NONGCSTATIC_BASE
-       cmp      byte  ptr [rax+8], 0
-       je       SHORT G_M63121_IG05
-       mov      rcx, gword ptr [rdi]
-       mov      rdx, rbx
-       mov      r8d, 1
-       call     System.Runtime.CompilerServices.TaskAwaiter:UnsafeOnCompletedInternal(ref,ref,bool)
-       nop      
-G_M63121_IG04:
-       add      rsp, 248
-       pop      rbx
-       pop      rbp
-       pop      rsi
-       pop      rdi
-       ret      
-G_M63121_IG05:
-       mov      rcx, qword ptr [rbp+16]
-       test     rcx, rcx
-       jne      SHORT G_M63121_IG06
-       mov      rcx, rsi
-       lea      rdx, [(reloc)]
-       call     CORINFO_HELP_RUNTIMEHANDLE_METHOD
-       mov      rcx, rax
-G_M63121_IG06:
-       call     CORINFO_HELP_GETGENERICS_NONGCSTATIC_BASE
-       cmp      byte  ptr [rax+9], 0
-       je       SHORT G_M63121_IG08
-       movzx    r8, byte  ptr [rdi+8]
-       mov      rcx, gword ptr [rdi]
-       mov      rdx, rbx
-       call     System.Runtime.CompilerServices.TaskAwaiter:UnsafeOnCompletedInternal(ref,ref,bool)
-       nop      
-G_M63121_IG07:
-       add      rsp, 248
-       pop      rbx
-       pop      rbp
-       pop      rsi
-       pop      rdi
-       ret      
-G_M63121_IG08:
-       mov      rcx, qword ptr [rbp+24]
-       test     rcx, rcx
-       jne      SHORT G_M63121_IG09
-       mov      rcx, rsi
-       lea      rdx, [(reloc)]
-       call     CORINFO_HELP_RUNTIMEHANDLE_METHOD
-       mov      rcx, rax
-G_M63121_IG09:
        mov      rdx, rdi
        mov      r8, rbx
        call     System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[VoidTaskResult][System.Threading.Tasks.VoidTaskResult]:AwaitArbitraryAwaiterUnsafeOnCompleted(byref,ref)
        nop      
-G_M63121_IG10:
-       add      rsp, 248
+G_M63121_IG04:
+       add      rsp, 96
        pop      rbx
-       pop      rbp
        pop      rsi
        pop      rdi
        ret      

@AndyAyersMS
Copy link
Member Author

Failures in CI seem unrelated -- arm timeouts and x86 perf post processing data format issues. May retry tomorrow or let it happen organically after PR feedback.

Changes have also passed desktop DDRs (save for SPMI which is still on the old guid)

Removing the do not merge and submitting for official review.

@dotnet/jit-contrib PTAL
@jkotas PTAL at runtime side

@AndyAyersMS AndyAyersMS removed the * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) label Oct 12, 2017
@AndyAyersMS
Copy link
Member Author

Also @stephentoub PTAL at changes to the async method builder code.

typeof(TAwaiter) == typeof(TaskAwaiter<byte>) ||
typeof(TAwaiter) == typeof(TaskAwaiter<int>) ||
typeof(TAwaiter) == typeof(TaskAwaiter<long>))
if (null != (object)default(TAwaiter))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any benefit in extending this guard to also cover the branches for ValueTaskAwaiter and ConfiguredValueTaskAwaiter? Those are both structs as well. Or maybe it doesn't matter because the JIT is already good at the explicit type comparisons used there?

Copy link
Member

@stephentoub stephentoub Oct 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I don't think as written this is correct. This condition will effectively be true for any non-nullable struct, right? In which case, this change causes the below ValueTaskAwaiter optimizations to be skipped.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think you're right. I should inline the value class checks instead, something like:

            if ((null != (object)default(TAwaiter)) && (awaiter is ITaskAwaiter))
            {
                ref TaskAwaiter ta = ref Unsafe.As<TAwaiter, TaskAwaiter>(ref awaiter); // relies on TaskAwaiter/TaskAwaiter<T> having the same layout
                TaskAwaiter.UnsafeOnCompletedInternal(ta.m_task, box, continueOnCapturedContext: true);
            }
            else if ((null != (object)default(TAwaiter)) && (awaiter is IConfiguredTaskAwaiter))
            {
                ref ConfiguredTaskAwaitable.ConfiguredTaskAwaiter ta = ref Unsafe.As<TAwaiter, ConfiguredTaskAwaitable.ConfiguredTaskAwaiter>(ref awaiter);
                TaskAwaiter.UnsafeOnCompletedInternal(ta.m_task, box, ta.m_continueOnCapturedContext);
            }

@AndyAyersMS
Copy link
Member Author

@dotnet-bot test Windows_NT x64 corefx_baseline

typeof(TAwaiter) == typeof(TaskAwaiter<byte>) ||
typeof(TAwaiter) == typeof(TaskAwaiter<int>) ||
typeof(TAwaiter) == typeof(TaskAwaiter<long>))
if ((null != (object)default(TAwaiter)) && (awaiter is ITaskAwaiter))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth a comment indicating why the null check is necessary? Or you'll be following up with subsequent changes to remove that, too?

Copy link
Member Author

@AndyAyersMS AndyAyersMS Oct 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The jit needs this check or it will leave casts around in some instantations. I will add a comment.

Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long the boxing is gone / the resulting asm looks good to you, the AsyncMethodBuilder.cs changes look good to me.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas PTAL at runtime side

LGTM

@AndyAyersMS
Copy link
Member Author

@stephentoub here are method sizes for various instantations of AwaitUnsafeOnCompleted that get prejitted. Unfortunately the jit dumps don't show the byref types so you have to guess at that part a bit. Many cases are considerably smaller, a few are a couple bytes larger because of the redundant null check issue I mentioned up above.

The value class check is needed for the cases where the byref args are &object, these create shared method instances and the jit can't tell from there whether __Canon implements the interfaces. So without that upfront check the jit would leave interface casts behind.

Old New Instance
270 93 <__Canon><System.__Canon>(byref,byref):this
473 93 <__Canon><System.__Canon>(byref,byref):this
473 122 <__Canon><System.__Canon>(byref,byref):this
39 43 <System.Boolean>(byref,byref):this
427 48 <System.Boolean>(byref,byref):this
39 43 <System.Threading.Tasks.VoidTaskResult>(byref,byref):this
226 43 <System.Threading.Tasks.VoidTaskResult>(byref,byref):this
39 43 <System.Int32>(byref,byref):this
39 43 <System.Threading.Tasks.VoidTaskResult>(byref,byref):this
39 43 <System.Threading.Tasks.VoidTaskResult>(byref,byref):this
39 44 <System.Threading.Tasks.VoidTaskResult>(byref,byref):this
40 49 <System.Threading.Tasks.VoidTaskResult>(byref,byref):this
89 80 <System.Threading.Tasks.VoidTaskResult>(byref,byref):this
270 93 <System.Threading.Tasks.VoidTaskResult>(byref,byref):this
428 122 <System.Threading.Tasks.VoidTaskResult>(byref,byref):this

@AndyAyersMS
Copy link
Member Author

@dotnet/jit-contrib anybody up for approving this? I will squash down when I merge.

Implement the jit interface compareTypesForEquality method
to handle casts from known types to known types, and from
shared types to certain interface types.

Call this method in the jit for castclass and isinst, using
`gtGetClassHandle` to obtain the from type. Optimize sucessful
casts and unsuccessful isinsts when the from type is known
exactly.

Undo part of the type-equality based optimization/workaround
in the AsyncMethodBuilder code that was introduced in dotnet#14178
in favor of interface checks. There is more here that can
be done here before this issue is entirely closed and I will
look at this subsequently.

This optimization allows the jit to remove boxes that are
used solely to feed type casts, and so closes #12877.
@AndyAyersMS
Copy link
Member Author

Squashed and updated commit text.

@dotnet-bot test Windows_NT x64 corefx_baseline

@AndyAyersMS
Copy link
Member Author

@sandreenko can you review? In particular the changes to the jit.


op2 = impTokenToHandle(&resolvedToken, nullptr, FALSE);
if (op2 == nullptr)
{ // compDonotInline()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the comment compDonotInline() mean here?
Should it be deleted or tested?
It is tested later on on line 14400

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can change it to an assert that compilation failed. assert(compDonotInline());


op2 = impTokenToHandle(&resolvedToken, nullptr, FALSE);
if (op2 == nullptr)
{ // compDonotInline()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again we have a comment compDonotInline()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's address those in a subsequent PR, as there are a bunch more of them scattered about.

Copy link

@sandreenko sandreenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Jit part looks good.

// impOptimizeCastClassOrIsInst: attempt to resolve a cast when jitting
//
// Arguments:
// op1 -- value to cast

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Arguments:
//    <argument1-name> - Description of argument 1
//    <argument2-name> - Description of argument 2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed (though I happen to like -- better).

@@ -10203,6 +10296,7 @@ void Compiler::impImportBlockCode(BasicBlock* block)
var_types lclTyp, ovflType = TYP_UNKNOWN;
GenTreePtr op1 = DUMMY_INIT(NULL);
GenTreePtr op2 = DUMMY_INIT(NULL);
GenTree* optTree = nullptr;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to create new scopes under CEE_ISINST and CASTCLASS: and declare optTree there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do.

@AndyAyersMS
Copy link
Member Author

Updated to address PR feedback. Also added a few targeted test cases I'd worked up.

@AndyAyersMS AndyAyersMS merged commit 98181d4 into dotnet:master Oct 13, 2017
@AndyAyersMS AndyAyersMS deleted the OptimizeTypeCasts branch October 13, 2017 22:55
@benaadams
Copy link
Member

🎉

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants