-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify set-reg-to-imm functions and optimize with LEA at runtime on x64 #60228
Conversation
This change * Unifies the genSetRegToImm and instGen_Set_Reg_To_Imm functions, removing the former. genSetRegToImm seems to be a variant that did not support relocations/handles, while the latter supported both. We now exclusively use the emitAttr to determine this. This was only a change in xarch since on ARM architectures the former function just forwarded to the latter. * Generates lea more often for handle GT_CNS_INT nodes on x64. Previously the logic generated only lea when a relocation was *required*, which is only the case for prejit. In particular, all VSD calls load an indirection cell that can use lea instead of the longer mov encoding during runtime JIT, which results in a nice code size decrease. * Changes ARM64 and ARM to handle byref-typed GT_CNS_INT nodes. This was fixed for xarch a few years back in 52a8890 but that change was not made to ARM architectures.
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsWhile working on #59602 I noticed that we did not use
|
No code size diffs on ARM64, but bullet 3 above means we see some GC info diffs: libraries.crossgen2.Linux.arm64.checked.mch:
Detail diffs
For example: ; gcr arg pop 0
adrp x11, [HIGH RELOC #0xd1ffab1e] // static handle
add x11, x11, [LOW RELOC #0xd1ffab1e]
+ ; byrRegs +[x11]
lsl x0, x0, #1
add x0, x0, x11
; byrRegs +[x0]
ldrsh w0, [x0]
; byrRegs -[x0]
ldr w11, [fp,#36] // [V57 tmp46]
+ ; byrRegs -[x11]
add w1, w0, w11
str w1, [fp,#40] // [V56 tmp45]
mov w0, w1 G_M47710_IG02: ; gcrefRegs=80000 {x19}, byrefRegs=0000 {}, byref
adrp x20, [HIGH RELOC #0xd1ffab1e] // static handle
add x20, x20, [LOW RELOC #0xd1ffab1e]
+ ; byrRegs +[x20]
adrp x11, [HIGH RELOC #0xd1ffab1e] // function address
add x11, x11, [LOW RELOC #0xd1ffab1e]
ldr x0, [x11]
blr x0
- ; byrRegs +[x0 x20]
+ ; byrRegs +[x0]
; gcr arg pop 0
ldr x0, [x0,#0xd1ffab1e]
; gcrRegs +[x0] For x64 we see diffs in non-crossgen2 collections only, as expected. The summary is: aspnet.run.windows.x64.checked.mch:
Detail diffs
coreclr_tests.pmi.windows.x64.checked.mch:
Detail diffs
libraries.pmi.windows.x64.checked.mch:
Detail diffs
libraries_tests.pmi.windows.x64.checked.mch:
Detail diffs
Some example diffs: G_M11776_IG03: ; gcrefRegs=00000040 {rsi}, byrefRegs=00000000 {}, byref
mov rcx, gword ptr [rsi+32]
; gcrRegs +[rcx]
- mov r11, 0xD1FFAB1E
+ lea r11, [(reloc)]
call [hackishModuleName:hackishMethodName()]
; gcrRegs -[rcx] +[rax]
; gcr arg pop 0
@@ -81,11 +81,11 @@ G_M11776_IG03: ; gcrefRegs=00000040 {rsi}, byrefRegs=00000000 {}, byref
call CORINFO_HELP_ASSIGN_REF
; gcrRegs -[rax rdx]
; byrRegs -[rcx]
- ;; bbWeight=0.50 PerfScore 8.00
+ ;; bbWeight=0.50 PerfScore 8.12 G_M40268_IG02: ; gcrefRegs=00000002 {rcx}, byrefRegs=00000000 {}, byref
; gcrRegs +[rcx]
mov rcx, gword ptr [rcx+8]
- mov r11, 0xD1FFAB1E
- ;; bbWeight=1 PerfScore 2.25
+ lea r11, [(reloc)]
+ ;; bbWeight=1 PerfScore 2.50
G_M40268_IG03: ; , epilog, nogc, extend
tail.jmp [hackishModuleName:hackishMethodName()]
; gcr arg pop 0 |
While this is x64 only, EA_PTRSIZE seems more correct.
PTAL @dotnet/jit-contrib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
windows/x64 improvement dotnet/perf-autofiling-issues#1907 |
another windows/x64 improvement - dotnet/perf-autofiling-issues#1934 |
While working on #59602 I noticed that we did not use
lea
for indirection cells at runtime in x64.This change
removing the former. genSetRegToImm seems to be a variant that did not
support relocations/handles, while the latter supported both. We now
exclusively use the emitAttr to determine this. This was only a change
in xarch since on ARM architectures the former function just forwarded
to the latter.
Previously the logic generated only lea when a relocation was
required, which is only the case for prejit. In particular, all VSD
calls load an indirection cell that can use lea instead of the longer
mov encoding during runtime JIT, which results in a nice code size
decrease.
fixed for xarch a few years back in
52a8890 but that change was not made
to ARM architectures.