HFA/Vector calling convention representation in the Jit on arm64. #37341

sandreenko · 2020-06-03T09:51:31Z

Hello, I could use some help with a new model for our return/call SIMD* typing that I am implementing, but first a few examples of what is happening now.

1 example:

    [MethodImpl(MethodImplOptions.NoInlining)]
    static Vector4 ReturnVector4()
    {
        return new Vector4(1);
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static Vector4 ReturnVector4UsingCall()
    {
        return ReturnVector4();
    }

IL for ReturnVector4UsingCall is very simple: call ReturnVector4; ret,
IR would be ASG(LCL_VAR, call); return LCL_VAR;
The complexity is that Arm64 supports both vector and HFA calling conventions, in this case
Vector4 is an HFA value, so we have to return it as v0.s[0], v1.s[0], v2.s[0], v3.s[0].
Now let's see how we import this call and with which type:

create it as TYP_STRUCT, using callRetTyp = JITtype2varType(calliSig.retType) in impImportCall;
change it to TYP_SIMD16 in impImportCall: callRetTyp = impNormStructType(actualMethodRetTypeSigClass); call->gtType = callRetTyp;;
change it back to TYP_STRUCT in impAssignStructPtr: src->gtType = genActualType(returnType); and that is the final value of the type.
a fun side-effect: even if call result is not used we are still creating ASG(LCL_VAR, call), change call type to struct and only later delete the ASG leaving call with the correct struct type.

Note for !compDoOldStructRetyping(): I don't do 2. and 3., so create as TYP_STRUCT and keep it.

and the return in this case is STRUCT, so we end up with nice IR:

***** BB01
STMT00000 (IL 0x000...0x005)
N005 ( 15,  4) [000003] -ACXG---R---              *  ASG       simd16 (copy)
N004 (  1,  1) [000001] D------N----              +--*  LCL_VAR   simd16<System.Numerics.Vector4> V01 tmp1         d:1
N003 ( 15,  4) [000000] --CXG-------              \--*  CALL r2r_ind struct TestHFAandHVA.ReturnVector4,NA,NA,NA
N002 (  1,  1) [000006] ------------ arg0 in x11     \--*  CNS_INT(h) long   0x29e89a04b90 ftn REG x11

***** BB01
STMT00001 (IL   ???...  ???)
N002 (  2,  2) [000005] ------------              *  RETURN    struct
N001 (  1,  1) [000004] -------N----              \--*  LCL_VAR   simd16<System.Numerics.Vector4> V01 tmp1         u:1 (last use)

Note/todo/fun fact: if we did not set LCL_VAR type as SIMD16 and keep it as a struct, then copy prop would optimize it as:

N002 (  2,  2) [000005] ------------              *  RETURN    struct
N003 ( 15,  4) [000000] --CXG-------              \--*  CALL r2r_ind struct TestHFAandHVA.ReturnVector4,NA,NA,NA
N002 (  1,  1) [000006] ------------ arg0 in x11     \--*  CNS_INT(h) long   0x29e89a04b90 ftn REG x11

Summary 1: in HFA case we type call and return as TYP_STRUCT with some confusing transformations in the middle.

2 example:

    [MethodImpl(MethodImplOptions.NoInlining)]
    static Vector<int> ReturnVectorInt()
    {
        return new Vector<int>();
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static Vector<int> ReturnVectorIntUsingCall()
    {
        return ReturnVectorInt();
    }

create it as TYP_STRUCT, using callRetTyp = JITtype2varType(calliSig.retType) in impImportCall;
change it to TYP_SIMD16 in impImportCall: callRetTyp = impNormStructType(actualMethodRetTypeSigClass); call->gtType = callRetTyp;;
keep it as TYP_SIMD16 in impAssignStructPtr: src->gtType = genActualType(returnType);.
and IR looks good:

***** BB01
STMT00000 (IL 0x000...0x005)
N005 ( 15,  4) [000003] -ACXG---R---              *  ASG       simd16 (copy)
N004 (  1,  1) [000001] D------N----              +--*  LCL_VAR   simd16<System.Numerics.Vector4> V01 tmp1         d:1
N003 ( 15,  4) [000000] --CXG-------              \--*  CALL r2r_ind struct TestHFAandHVA.ReturnVector4,NA,NA,NA
N002 (  1,  1) [000006] ------------ arg0 in x11     \--*  CNS_INT(h) long   0x29e89a04b90 ftn REG x11

***** BB01
STMT00001 (IL   ???...  ???)
N002 (  2,  2) [000005] ------------              *  RETURN    struct
N001 (  1,  1) [000004] -------N----              \--*  LCL_VAR   simd16<System.Numerics.Vector4> V01 tmp1         u:1 (last use)

Summary 1,2: based on these 2 examples we could think that TYP_SIMD16 on a call or a return means passed in a single vector register and it will have TYP_STRUCT when it is an HFA,
and TYP_STRUCT can be assigned to TYP_SIMD16, but...

3 example:

    struct A
    {
        bool a;
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static Vector<A> ReturnVectorNotKnown()
    {
        return new Vector<A>();
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static Vector<A> ReturnVectorNotKnownUsingCall()
    {
        return ReturnVectorNotKnown();
    }

guess which type Jit will use for it before you read the answer :-)

The IR after importation will be:

               [000001] --C-G-------              *  RETURN    simd16
               [000000] --C-G-------              \--*  CALL r2r_ind struct TestHFAandHVA.ReturnVectorNotKnown

because for the return TYPE we ask VM and for call type we use getBaseTypeAndSizeOfSIMDType that can only parse known primitive types, so we get a nice mistyping out of nowhere,
does not look like a problem so far, JIT can handle it using morph::fgFixupStructReturn that sets call type to simd16.

3.1. example:

add a temp local var to the last example:

    [MethodImpl(MethodImplOptions.NoInlining)]
    static Vector<A> ReturnVectorNotKnownUsingCallAndTemp()
    {
        var a = ReturnVectorNotKnown();
        return a;
    }

and we have IR that we want right after importation, thanks to impAssignStructPtr from the first example:

***** BB01
STMT00000 (IL 0x000...0x005)
               [000003] -AC-G-------              *  ASG       simd16 (copy)
               [000001] D------N----              +--*  LCL_FLD   simd16 V00 loc0         [+0]
               [000000] --C-G-------              \--*  CALL r2r_ind simd16 TestHFAandHVA.ReturnVectorNotKnown

***** BB01
STMT00001 (IL 0x006...0x007)
               [000005] ------------              *  RETURN    simd16
               [000004] ------------              \--*  LCL_FLD   simd16 V00 loc0         [+0]

but V00 is created as STRUCT, so can't put it in a register, sad:

Generating: N009 ( 15,  4) [000000] --CXG-------         t0 = *  CALL r2r_ind simd16 TestHFAandHVA.ReturnVectorNotKnown REG d0 $140
IN0003:                           ldr     x0, [x11]
Call: GCvars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}
IN0004:                           blr     x0
                                                              /--*  t0     simd16 
Generating: N011 ( 19,  9) [000003] DA-XG-------              *  STORE_LCL_FLD simd16 V00 loc0         d:2[+0] NA REG NA
IN0005:                           str     q0, [fp,#16]
                            Live vars: {} => {V00}
Added IP mapping: 0x0006 STACK_EMPTY (G_M38418_IG02,ins#5,ofs#20)
Generating: N013 (???,???) [000010] ------------                 IL_OFFSET void   IL offset: 0x6 REG NA
Generating: N015 (  3,  4) [000004] ------------         t4 =    LCL_FLD   simd16 V00 loc0         u:2[+0] d16 (last use) REG d16 $141
IN0006:                           ldr     q16, [fp,#16]
                            Live vars: {V00} => {}
                                                              /--*  t4     simd16 
Generating: N017 (  4,  5) [000005] ------------              *  RETURN    simd16 REG NA $142
IN0007:                           mov     v0.16b, v16.16b

Note for !compDoOldStructRetyping(): we do not want all these retyping to happens in random places, so we want types not to change after we create them during importation until they reach lowering.
Question: but what type should we use in the last example? TYP_STRUCT works much better, because then we don't need access LCL_VAR as LCL_FLD, they have exactly the same types and Jit knows that!
For now, I am stick with TYP_STRUCT in all cases for all call types, keep RETURN as VM sees them, but it cases asserts that I can't avoid without implementing #11413, because we start getting IND SIMD16(ADDR byref(call STRUCT) for such calls and ADDR(call) is not a valid IR (we sometimes create them, but we are lucky in those examples and I am not lucky in mine).

Summary 1, 2, 3: do not try to guess Jit TYP looking at C# code.

4 example:

    [MethodImpl(MethodImplOptions.NoInlining)]
    static Vector<T> ReturnVectorTWithMerge<T>(int v, T init1, T init2, T init3, T init4) where T : struct
    {
        if (v == 0)
        {
            return new Vector<T>();
        }
        else if (v == 1)
        {
            return new Vector<T>(init1);
        }
        else if (v == 2)
        {
            return new Vector<T>(init2);
        }
        else if (v == 3)
        {
            return new Vector<T>(init3);
        }
        else
        {
            return new Vector<T>(init4);
        }
    }
    
    struct A
    {
        bool a;
    }
    
    ReturnVectorTWithMerge<A>(int v, a, b, c, d);

so we know that return->gtType == TYP_SIMD and call types would be TYP_STRUCT, and we know that it is working fine somehow and after morph, we change call types to TYP_SIMD and it is great.

but here comes my favorite thing: return merging, we create a 1 local var where we put all return results and it is happening before global morph, during PHASE Morph - Add internal blocks,
can you guess the type of this LCL_VAR?

lvaGrabTemp returning 12 (V12 tmp5) called for Single return block return value.
SIMD Candidate Type System.Numerics.Vector`1[System.__Canon]
  Unknown SIMD Vector<T>

mergeReturns statement tree [000071] added to genReturnBB BB10 [0009]
               [000071] ------------              *  RETURN    struct
               [000070] -------N----              \--*  LCL_VAR   struct<System.Numerics.Vector`1[__Canon], 16> V12 tmp5

and return knows it is a struct somehow... But maybe morph will fix it like it fixes calls? Nop... it will bail out with an assert that you can easily repro in the current master, see #37247:

Assert failure(PID 198612 [0x000307d4], Thread: 221228 [0x3602c]): Assertion failed '!"Incompatible types for gtNewTempAssign"' in 'TestHFAandHVA:ReturnVectorTWithMerge(int,System.__Canon,System.__Canon,System.__Canon,System.__Canon):System.Numerics.Vector`1[__Canon]' during 'Morph - Global' (IL size 54)

    File: F:\git\runtime\src\coreclr\src\jit\gentree.cpp Line: 15159
    Image: F:\git\runtime\artifacts\bin\coreclr\Windows_NT.arm64.Checked\x64\crossgen.exe

when we try to do ASG(LCL_VAR struct our merge lclVar, LCL_FLD SIMD16 from our calls

It doesn't nowadays lead to a bad codegen in release, because lower has a handling for it under compDoOldStructRetyping() == false and we ignore asserts
that is actually compDoOldStructRetyping() == false and do the right thing of setting RETURN TYP back to SIMD16, I do not have an older version or runtime to check what was happening before compDoOldStructRetyping.

Summary 1, 2, 3, 4: with compDoOldStructRetyping == true the old system is very unpredictable and fragile, with failures in simple cases.
compDoOldStructRetyping == false that I am trying to support on arm64 has the same difficulties and I would like to hear @CarolEidt , @tannergooding , @dotnet/jit-contrib opinions about types that we should choose in each case. I have tried many options and none of them was good enough.

I have started working on #11413, so I could keep calls as TYP_STRUCT always, ignoring SIMD and avoiding creating IND(ADDR(CALL) when we assign their results to LCL_VAR/FLD SIMD*, what do you think?

The text was updated successfully, but these errors were encountered:

tannergooding · 2020-06-03T17:04:51Z

Is there any reason we can't keep it as TYP_SIMD and look up the class handle to determine the actual type? I would imagine that would be the most useful and least likely way to break long-term...

That is, we already have issues today with GT_HWINTRINSIC and GT_SIMD nodes lying about their type and that preventing CSE or other optimizations.
@CarolEidt added the ClassLayout* m_layout field (https://github.com/dotnet/runtime/blob/master/src/coreclr/src/jit/gentree.h#L4735) which should allow us to start alleviating this.

It's possible this should actually be the COFINO_METHOD_HANDLE instead, like we have for GT_INTRINSIC: https://github.com/dotnet/runtime/blob/master/src/coreclr/src/jit/gentree.h#L4707, as that would allow us to rewrite the intrinsics as user calls in rationalization/lowering when one of the operands is non-constant (see HW intrinsics: Poor codegen for operations requiring imm8 when the value is calculated to constant during JIT #11062 and related)

There are also issues like #904, which talk about broadening the TYP_SIMD support to some user-defined types as many of the optimizations are more broadly applicable to any struct { float x, float y }; and not just to Vector2/3/4.

Given all this, it seems like the most covering solution would be to just keep the node as TYP_SIMD where possible and to ensure that the corresponding class handle is tracked so we can always determine if it is Vector2/3/4, Vector<T>, Vector64<T>, Vector128<T>, Vector256<T> or even potentially some user-defined struct Vector4 { float x, y, z, w; } and therefore no exactly how it should be passed, returned, or optimized when inlined, stack spilled, etc.

CarolEidt · 2020-06-03T17:24:03Z

@sandreenko - thanks for the thorough write-up! Overall, I think the right direction to pursue is toward keeping TYP_STRUCT in the front-end of the JIT, and introducing TYP_SIMD types in Lowering. Now that we have (some) optimization support for TYP_STRUCT, and as we move to having correct class handles for all struct values, it's not as useful to have the SIMD types in the front-end. In fact, it may be that they're not even really needed in the backend, but that may be an investigation for another day.

A few more detailed comments:

In your example 3.1, I'm not worried that an invalid vector isn't enregistered - we should never see that in real code anyway. We just want to make sure that it doesn't generate incorrect code.
I'm not sure how we wind up with IND SIMD16(ADDR byref(call STRUCT) but we should definitely avoid creating that. We used to create IND SIMD16(ADDR byref(GT_SIMD)) but I'm pretty sure we've eliminated that weirdness.
Not entirely related, but on the theme of ensuring that we have valid class handles everywhere, I've started looking into getting the ABI right for vector args & returns on x64/ux and vector returns on x64/windows. In that process, I'd like to ensure that all struct values have valid class handles and that we can readily access a shared descriptor for how a struct is returned (or passed if it's passed in registers). I think the ClassLayout is probably the right place for that, and that could clean things up a bit.

I have started working on #11413, so I could keep calls as TYP_STRUCT always, ignoring SIMD and avoiding creating IND(ADDR(CALL) when we assign their results to LCL_VAR/FLD SIMD*, what do you think?

I think that's definitely the right approach.

sandreenko · 2020-06-03T22:16:50Z

Thank you, @tannergooding and @CarolEidt !

I have started working on #11413, so I could keep calls as TYP_STRUCT always, ignoring SIMD and avoiding creating IND(ADDR(CALL) when we assign their results to LCL_VAR/FLD SIMD*, what do you think?

I think that's definitely the right approach.

I will do that then, it means in all cases 1-3 call will have TYP_STRUCT until lowering.

Given all this, it seems like the most covering solution would be to just keep the node as TYP_SIMD where possible and to ensure that the corresponding class handle is tracked so we can always determine if it is Vector2/3/4, Vector, Vector64, Vector128, Vector256 or even potentially some user-defined struct Vector4 { float x, y, z, w; } and therefore no exactly how it should be passed, returned, or optimized when inlined, stack spilled, etc.

It requires more work and benefits are not clear, there are problems with it even on VM side (see #35144). Another example is #36868, if you change compMethodReturnsNativeScalarType to return true for SIMD16 on arm64 when it is returned as Vector you will get asserts in struct promotion about missing struct handle for them. It could be not hard to fix, but after a few hours, I was not able to clean all of these asserts so postponed that change.

keep the node as TYP_SIMD where possible

we don't have an existing function to tell us if it is a SIMD or not. As I have shown Vector<A> is not currently a SIMD16, I do not see what advantages we will get if we recognize them as such.

tannergooding · 2020-06-03T23:42:21Z

Thanks for the explanation @sandreenko!

Just as a note, however:

As I have shown Vector is not currently a SIMD16, I do not see what advantages we will get if we recognize them as such.

Vector<A> also should not be a TYP_SIMD16. We only support a limited number of (currently 10) base types for the Vector<T>, Vector64<T>, Vector128<T>, and Vector256<T> types.
Everything else, in any real usage, will cause an exception to be thrown.
Likewise, they should be blocked from P/Invokes and various other scenarios (and I believe mostly are already) so that we can freely expand them to support new types in the future (such as Half and potentially nint/nuint) without it being breaking.

CarolEidt · 2020-06-04T00:05:58Z

Vector<A> also should not be a TYP_SIMD16

While that's true, it must be handled in a way that doesn't cause the JIT to crash, or otherwise generate code that would cause the program to give the wrong exception. That's actually another reason that leaving the type-to-machine mapping until Lowering might simplify things. For the "front end" we just need to ensure that we enable "free" copies between value types that are equivalent, and that we have the right information to determine whether struct promotion is profitable.

Closes dotnet#37341, closes dotnet#37880.

* Reenable GitHub_26491. Closes #13355 * Reenable crossgen2 tests failing with old retyping/ They were fixed both with and without retyping. Closes #37883. * Reenable HVA merge cases. Closes #37341, closes #37880. * Reenable GitHub_35821. Closes #36206, closes #36418. The issue was fixed by #37499. * Delete extra lines that are no longer needed. #37506 was fixed in #38241. * delete a throwing init.

sandreenko added design-discussion Ongoing discussion about design without consensus area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Jun 3, 2020

Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Jun 3, 2020

sandreenko removed the untriaged New issue has not been triaged by the area owner label Jun 3, 2020

sandreenko added this to the 5.0 milestone Jun 3, 2020

sandreenko self-assigned this Jun 4, 2020

sandreenko mentioned this issue Jun 5, 2020

No retyping arm/arm64. #36866

Merged

sandreenko mentioned this issue Jun 15, 2020

Test failure: JIT\\Directed\\StructABI\\structreturn\\structreturn.cmd #37880

Closed

sandreenko pushed a commit to sandreenko/runtime that referenced this issue Jul 7, 2020

Reenable HVA merge cases.

b4faef4

Closes dotnet#37341, closes dotnet#37880.

sandreenko mentioned this issue Jul 7, 2020

Reenable fixed tests after noRetyping and similar changes. #38855

Merged

sandreenko closed this as completed in #38855 Jul 8, 2020

ghost locked as resolved and limited conversation to collaborators Dec 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HFA/Vector calling convention representation in the Jit on arm64. #37341

HFA/Vector calling convention representation in the Jit on arm64. #37341

sandreenko commented Jun 3, 2020

tannergooding commented Jun 3, 2020

CarolEidt commented Jun 3, 2020

sandreenko commented Jun 3, 2020

tannergooding commented Jun 3, 2020

CarolEidt commented Jun 4, 2020

HFA/Vector calling convention representation in the Jit on arm64. #37341

HFA/Vector calling convention representation in the Jit on arm64. #37341

Comments

sandreenko commented Jun 3, 2020

1 example:

2 example:

3 example:

3.1. example:

4 example:

tannergooding commented Jun 3, 2020

CarolEidt commented Jun 3, 2020

sandreenko commented Jun 3, 2020

tannergooding commented Jun 3, 2020

CarolEidt commented Jun 4, 2020