Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Transform multi-reg args to FIELD_LIST in physical promotion #104370

Merged

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Jul 3, 2024

This allows promoted fields to stay in registers when they are passed as
multi-reg arguments.

Example linux-x64 diff with DOTNET_JitStressModeNames=STRESS_NO_OLD_PROMOTION:

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static void Test(int[] arr)
{
    Foo(arr);
}

Base:

; Assembly listing for method Program:Test(int[]) (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX - Unix
; FullOpts code
; optimized code
; rbp based frame
; partially interruptible
; No PGO data
; 1 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; invoked as altjit
; Final local variable assignments
;
;  V00 arg0         [V00,T01] (  5,  4   )     ref  ->  rdi         class-hnd single-def <int[]>
;# V01 OutArgs      [V01    ] (  1,  1   )  struct ( 0) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;* V02 tmp1         [V02    ] (  0,  0   )  struct (16) zero-ref    do-not-enreg[S] "spilled call-like call argument" <System.Span`1[int]>
;  V03 tmp2         [V03,T00] (  4,  8   )  struct (16) [rbp-0x20]  do-not-enreg[SFA] multireg-arg must-init ld-addr-op "NewObj constructor temp" <System.Span`1[int]>
;  V04 tmp3         [V04,T02] (  3,  1.50)   byref  ->  rbx         "V03.[000..008)"
;  V05 tmp4         [V05,T03] (  3,  1.50)     int  ->  r15         "V03.[008..012)"
;
; Lcl frame size = 16

G_M55702_IG01:  ;; offset=0x0000
       push     rbp
       push     r15
       push     rbx
       sub      rsp, 16
       lea      rbp, [rsp+0x20]
       xor      eax, eax
       mov      qword ptr [rbp-0x20], rax
						;; size=19 bbWeight=1 PerfScore 5.00
G_M55702_IG02:  ;; offset=0x0013
       test     rdi, rdi
       je       SHORT G_M55702_IG06
						;; size=5 bbWeight=1 PerfScore 1.25
G_M55702_IG03:  ;; offset=0x0018
       lea      rbx, bword ptr [rdi+0x10]
       mov      r15d, dword ptr [rdi+0x08]
						;; size=8 bbWeight=0.50 PerfScore 1.25
G_M55702_IG04:  ;; offset=0x0020
       mov      bword ptr [rbp-0x20], rbx
       mov      dword ptr [rbp-0x18], r15d
       mov      rdi, bword ptr [rbp-0x20]
       mov      rsi, qword ptr [rbp-0x18]
       mov      rax, 0x7FF97F8FC648      ; code for Program:Foo(System.Span`1[int])
       call     [rax]Program:Foo(System.Span`1[int])
       nop      
						;; size=29 bbWeight=1 PerfScore 7.50
G_M55702_IG05:  ;; offset=0x003D
       add      rsp, 16
       pop      rbx
       pop      r15
       pop      rbp
       ret      
						;; size=9 bbWeight=1 PerfScore 2.75
G_M55702_IG06:  ;; offset=0x0046
       xor      rbx, rbx
       xor      r15d, r15d
       jmp      SHORT G_M55702_IG04
						;; size=7 bbWeight=0 PerfScore 0.00

; Total bytes of code 77, prolog size 19, PerfScore 17.75, instruction count 26, allocated bytes for code 77 (MethodHash=340b2669) for method Program:Test(int[]) (FullOpts)
; ============================================================

Diff:

; Assembly listing for method Program:Test(int[]) (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX - Unix
; FullOpts code
; optimized code
; rbp based frame
; partially interruptible
; No PGO data
; 1 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; invoked as altjit
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  5,  4   )     ref  ->  rdi         class-hnd single-def <int[]>
;# V01 OutArgs      [V01    ] (  1,  1   )  struct ( 0) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;* V02 tmp1         [V02    ] (  0,  0   )  struct (16) zero-ref    do-not-enreg[S] "spilled call-like call argument" <System.Span`1[int]>
;* V03 tmp2         [V03    ] (  0,  0   )  struct (16) zero-ref    do-not-enreg[SF] ld-addr-op "NewObj constructor temp" <System.Span`1[int]>
;  V04 tmp3         [V04,T01] (  3,  1.50)   byref  ->  rax         "V03.[000..008)"
;  V05 tmp4         [V05,T02] (  3,  1.50)     int  ->  rsi         "V03.[008..012)"
;
; Lcl frame size = 0

G_M55702_IG01:  ;; offset=0x0000
       push     rbp
       mov      rbp, rsp
						;; size=4 bbWeight=1 PerfScore 1.25
G_M55702_IG02:  ;; offset=0x0004
       test     rdi, rdi
       je       SHORT G_M55702_IG06
						;; size=5 bbWeight=1 PerfScore 1.25
G_M55702_IG03:  ;; offset=0x0009
       lea      rax, bword ptr [rdi+0x10]
       mov      esi, dword ptr [rdi+0x08]
						;; size=7 bbWeight=0.50 PerfScore 1.25
G_M55702_IG04:  ;; offset=0x0010
       mov      rdi, rax
       mov      rax, 0x7FF97F91C648      ; code for Program:Foo(System.Span`1[int])
       call     [rax]Program:Foo(System.Span`1[int])
       nop      
						;; size=16 bbWeight=1 PerfScore 3.75
G_M55702_IG05:  ;; offset=0x0020
       pop      rbp
       ret      
						;; size=2 bbWeight=1 PerfScore 1.50
G_M55702_IG06:  ;; offset=0x0022
       xor      rax, rax
       xor      esi, esi
       jmp      SHORT G_M55702_IG04
						;; size=6 bbWeight=0 PerfScore 0.00

; Total bytes of code 40, prolog size 4, PerfScore 9.00, instruction count 15, allocated bytes for code 40 (MethodHash=340b2669) for method Program:Test(int[]) (FullOpts)
; ============================================================

I don't expect super large diffs with old promotion enabled because most structs that are candidates to be multi-register arguments are handled just fine by old promotion. However, this moves us closer to being able to remove old promotion.

This allows promoted fields to stay in registers when they are passed as
multi-reg arguments.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 3, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@jakobbotsch
Copy link
Member Author

As a note: I think long term we would benefit from making FIELD_LIST a first class TYP_STRUCT node that is allowed as the source of stores and other places that may take TYP_STRUCT nodes. Currently there is an invariant after morph that FIELD_LIST appears only for call arguments and that it matches the underlying ABI (i.e. one field for every register and so on). I think a good direction would be to move call ABI handling into the backend and then make it lowering's responsibility to get FIELD_LIST into the right form. We would then allow any shape of FIELD_LIST to construct a struct value until lowering.

For a more uniform representation we would also allow FIELD_LIST for single-register struct values and single-register return values. It would avoid the unfortunate retyping around returns/call arguments that we have today.

jakobbotsch added a commit to jakobbotsch/runtime that referenced this pull request Jul 4, 2024
Before this change the JIT has two places where it may introduce temps
during call args morphing:
1. Directly during `fgMorphArgs` as part of making struct copies for
   some struct args, like implicit byref args and other struct args
   requiring to be of `LCL_VAR` shape
2. During `EvalArgsToTemps` as part of making sure evaluation order is
   maintained while we get the call into the right shape with registers

To make this work we have `CallArg::m_tmpNum` that communicates the
local from 1 to 2; the responsibility of creating the actual `LCL_VAR`
node to put in the late argument list was left to `EvalArgsToTemps`
while `fgMorphArgs` would just change the early setup node to a store
into the temporary.

This PR changes it so that 1 directly introduces the right late node
when it creates its temporary. That is, it puts the call argument into
the right shape immediately and avoids relying on `EvalArgsToTemps` to
create the local node in the late list.

The benefit of that is that we no longer need to communicate the
temporary as part of every `CallArg`. However, the motivation here is
something else: as part of dotnet#104370 we may have arguments that need
multiple temporaries to copy, so having things working in this way
introduces complications for that.
Before this change the JIT has two places where it may introduce temps
during call args morphing:
1. Directly during `fgMorphArgs` as part of making struct copies for
   some struct args, like implicit byref args and other struct args
   requiring to be of `LCL_VAR` shape
2. During `EvalArgsToTemps` as part of making sure evaluation order is
   maintained while we get the call into the right shape with registers

To make this work we have `CallArg::m_tmpNum` that communicates the
local from 1 to 2; the responsibility of creating the actual `LCL_VAR`
node to put in the late argument list was left to `EvalArgsToTemps`
while `fgMorphArgs` would just change the early setup node to a store
into the temporary.

This PR changes it so that 1 directly introduces the right late node
when it creates its temporary. That is, it puts the call argument into
the right shape immediately and avoids relying on `EvalArgsToTemps` to
create the local node in the late list.

The benefit of that is that we no longer need to communicate the
temporary as part of every `CallArg`. However, the motivation here is
something else: as part of dotnet#104370 we may have arguments that need
multiple temporaries to copy, so having things working in this way
introduces complications for that.
@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, runtime-jit-experimental

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

jakobbotsch added a commit that referenced this pull request Jul 10, 2024
Before this change the JIT has two places where it may introduce temps
during call args morphing:
1. Directly during `fgMorphArgs` as part of making struct copies for
   some struct args, like implicit byref args and other struct args
   requiring to be of `LCL_VAR` shape
2. During `EvalArgsToTemps` as part of making sure evaluation order is
   maintained while we get the call into the right shape with registers

To make this work we have `CallArg::m_tmpNum` that communicates the
local from 1 to 2; the responsibility of creating the actual `LCL_VAR`
node to put in the late argument list was left to `EvalArgsToTemps`
while `fgMorphArgs` would just change the early setup node to a store
into the temporary.

This PR changes it so that 1 directly introduces the right late node
when it creates its temporary. That is, it puts the call argument into
the right shape immediately and avoids relying on `EvalArgsToTemps` to
create the local node in the late list.

The benefit of that is that we no longer need to communicate the
temporary as part of every `CallArg`. However, the motivation here is
something else: as part of #104370 we may have arguments that need
multiple temporaries to copy, so having things working in this way
introduces complications for that.
@jakobbotsch
Copy link
Member Author

jakobbotsch commented Jul 11, 2024

The current win-arm64 failures look like bad codegen by MSVC for some JIT code in fgMorphArgs. I'm trying to reduce the repro case and open an issue. MSVC seems to be reloading a 4-byte value with a 1-byte load.

Edit: here's a small repro: https://godbolt.org/z/PGd5d6K9d
Notice the tbnz w8, #0x1F (wrong) for MSVC 19.40 compared to tbnz w8, #7 (right) for the latest version. So looks fixed already.

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr superpmi-diffs, runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, runtime-jit-experimental

Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@jakobbotsch jakobbotsch marked this pull request as ready for review July 12, 2024 15:00
@jakobbotsch
Copy link
Member Author

cc @dotnet/jit-contrib PTAL @EgorBo

@jakobbotsch jakobbotsch requested a review from EgorBo July 12, 2024 15:00
@jakobbotsch
Copy link
Member Author

Ping @EgorBo

Copy link
Member

@EgorBo EgorBo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jakobbotsch jakobbotsch merged commit 767e416 into dotnet:main Jul 19, 2024
139 of 146 checks passed
@jakobbotsch jakobbotsch deleted the physical-promotion-field-list-args branch July 19, 2024 20:04
@github-actions github-actions bot locked and limited conversation to collaborators Aug 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants