-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: some enhancements to redundant branch opts #60732
Conversation
Handle cases where the dominating compare is the reverse of the compare we're trying to optimize. For example, if `(x > y)` dominates `(y <= x)` we may be able to optimize the dominated compare. Addresses aspects of dotnet#48115.
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsHandle cases where the dominating compare is the reverse of the compare Addresses aspects of #48115.
|
cc @dotnet/jit-contrib Small number of regressions, mostly RA/CSE differences. aspnet.run.windows.x64.checked.mch:
Detail diffs
benchmarks.run.windows.x64.checked.mch:
Detail diffs
coreclr_tests.pmi.windows.x64.checked.mch:
Detail diffs
libraries.crossgen2.windows.x64.checked.mch:
Detail diffs
libraries.pmi.windows.x64.checked.mch:
Detail diffs
libraries_tests.pmi.windows.x64.checked.mch:
Detail diffs
Typical diff ; Assembly listing for method System.Data.SqlClient.TdsParser:set_CurrentTransaction(System.Data.SqlClient.SqlInternalTransaction):this
; ...
; V00 this [V00,T01] ( 4, 3.50) ref -> rcx this class-hnd single-def
; V01 arg1 [V01,T00] ( 5, 3.50) ref -> rdx class-hnd single-def
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [rsp+00H] "OutgoingArgSpace"
-; V03 cse0 [V03,T02] ( 3, 2.50) ref -> rax "CSE - aggressive"
;
; Lcl frame size = 0
@@ -17,23 +16,19 @@ G_M27428_IG01: ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, nog
;; bbWeight=1 PerfScore 0.00
G_M27428_IG02: ; gcrefRegs=00000006 {rcx rdx}, byrefRegs=00000000 {}, byref, isz
; gcrRegs +[rcx rdx]
- mov rax, gword ptr [rcx+48]
- ; gcrRegs +[rax]
- test rax, rax
+ cmp gword ptr [rcx+48], 0
jne SHORT G_M27428_IG04
- ;; bbWeight=1 PerfScore 3.25
-G_M27428_IG03: ; gcrefRegs=00000007 {rax rcx rdx}, byrefRegs=00000000 {}, byref, isz
+ ;; bbWeight=1 PerfScore 4.00
+G_M27428_IG03: ; gcrefRegs=00000006 {rcx rdx}, byrefRegs=00000000 {}, byref, isz
test rdx, rdx
jne SHORT G_M27428_IG05
- ;; bbWeight=0.50 PerfScore 0.62
-G_M27428_IG04: ; gcrefRegs=00000007 {rax rcx rdx}, byrefRegs=00000000 {}, byref, isz
- test rax, rax
- je SHORT G_M27428_IG06
+ jmp SHORT G_M27428_IG06
+ ;; bbWeight=0.50 PerfScore 1.62
+G_M27428_IG04: ; gcrefRegs=00000006 {rcx rdx}, byrefRegs=00000000 {}, byref, isz
test rdx, rdx
jne SHORT G_M27428_IG06
- ;; bbWeight=0.50 PerfScore 1.25
+ ;; bbWeight=0.50 PerfScore 0.62
G_M27428_IG05: ; gcrefRegs=00000006 {rcx rdx}, byrefRegs=00000000 {}, byref
- ; gcrRegs -[rax]
lea rcx, bword ptr [rcx+48]
; gcrRegs -[rcx]
; byrRegs +[rcx]
@@ -48,7 +43,7 @@ G_M27428_IG07: ; , epilog, nogc, extend
ret
;; bbWeight=1 PerfScore 1.00
-; Total bytes of code 35, prolog size 0, PerfScore 10.63, instruction count 13, allocated bytes for code 35 (MethodHash=d2cb94db) for method System.Data.SqlClient.TdsParser:set_CurrentTransaction(System.Data.SqlClient.SqlInternalTransaction):this
+; Total bytes of code 30, prolog size 0, PerfScore 11.25, instruction count 11, allocated bytes for code 30 (MethodHash=d2cb94db) for method System.Data.SqlClient.TdsParser:set_CurrentTransaction(System.Data.SqlClient.SqlInternalTransaction):this One interesting regression, in
|
Nice! What about this case:
both conditions are absolutely the same, just the order of operands is different. also, I assume this opt won't work for |
Not if sure all these cases are mapped to one standard form -- so probably worth looking at in more depth.
Good point, we should exclude fp compares here. Let's see if we get any failures from this in existing testing. |
A note to readers: this is because we do not model floating-pont |
GCC build seems to be previously broken:
See #60225 (comment) |
@EgorBo I am also seeing the case you noted above (see below, messed-up example edited out) |
[edit: removed obsolete / incorrect info] |
Unfortunately it seems this would require "further work", as it trips up the fragile range check elimination. |
Above summary of adding support for swapped compares wasn't quite implemented properly. Updated diff of diffs is -131 total methods with Code Size differences (128 improved, 3 regressed), 14 unchanged.
-193 total methods with Code Size differences (171 improved, 22 regressed), 3 unchanged.
-421 total methods with Code Size differences (418 improved, 3 regressed), 5 unchanged.
-363 total methods with Code Size differences (343 improved, 20 regressed), 8 unchanged.
-1197 total methods with Code Size differences (1112 improved, 85 regressed), 45 unchanged.
-1241 total methods with Code Size differences (1156 improved, 85 regressed), 35 unchanged.
+134 total methods with Code Size differences (131 improved, 3 regressed), 14 unchanged.
+210 total methods with Code Size differences (185 improved, 25 regressed), 5 unchanged.
+385 total methods with Code Size differences (381 improved, 4 regressed), 5 unchanged.
+538 total methods with Code Size differences (519 improved, 19 regressed), 8 unchanged.
+1455 total methods with Code Size differences (1368 improved, 87 regressed), 53 unchanged.
+1281 total methods with Code Size differences (1196 improved, 85 regressed), 39 unchanged. One interesting example: We now produce the following odd looking bit of code... G_M030_IG01: ; gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref, nogc <-- Prolog IG
mov rax, rcx
; gcrRegs +[rax]
;; bbWeight=1 PerfScore 0.25
G_M030_IG02: ; gcrefRegs=00000001 {rax}, byrefRegs=00000000 {}, byref, isz
cmp dword ptr [rax+24], 0
jg SHORT G_M030_IG04
;; bbWeight=1 PerfScore 4.00
G_M030_IG03: ; gcrefRegs=00000001 {rax}, byrefRegs=00000000 {}, byref, epilog, nogc
ret
;; bbWeight=0.50 PerfScore 0.50
G_M030_IG04: ; gcrefRegs=00000001 {rax}, byrefRegs=00000000 {}, byref, epilog, nogc
ret |
Updated diff summary aspnet.run.windows.x64.checked.mch:
Detail diffs
benchmarks.run.windows.x64.checked.mch:
Detail diffs
coreclr_tests.pmi.windows.x64.checked.mch:
Detail diffs
libraries.crossgen2.windows.x64.checked.mch:
Detail diffs
libraries.pmi.windows.x64.checked.mch:
Detail diffs
libraries_tests.pmi.windows.x64.checked.mch:
Detail diffs
|
@dotnet/jit-contrib ping |
I must be missing something. In your example above, if I pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good, but the discussion of the F
example seems wrong.
src/coreclr/jit/valuenum.cpp
Outdated
// | ||
// Arguments: | ||
// vn - vn to base things on | ||
// vnk - whether the new vn should swap, reverse, or both |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// vnk - whether the new vn should swap, reverse, or both | |
// vrk - whether the new vn should swap, reverse, or both |
src/coreclr/jit/valuenum.cpp
Outdated
// vn for reversed/swapped comparsion, or NoVN. | ||
// | ||
// Note: | ||
// If "vn" corresponds to (x > y), the resulting VN correponds to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// If "vn" corresponds to (x > y), the resulting VN correponds to | |
// If "vn" corresponds to (x > y), the resulting VN corresponds to: |
// Note we could also infer the tree relop's value from similar relops higher in the dom tree. | ||
// For example, (x >= 0) dominating (x > 0), or (x < 0) dominating (x > 0). | ||
// | ||
// That is left as a future enhancement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment obsolete now? (Or in need of updating to match the new code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update it -- there are still cases we don't get, like the first one listed.
Sorry about that. My example above was indeed messed up.... here's a corrected one that matches Egor's snippet static int F3(int x, int y)
{
int r = 0;
if (x > y)
{
r += 1;
if (y >= x) r += 2;
}
return r;
} ;;; base
; Assembly listing for method X:F3(int,int):int
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 4, 3.50) int -> rcx single-def
; V01 arg1 [V01,T01] ( 4, 3.50) int -> rdx single-def
; V02 loc0 [V02,T02] ( 4, 3 ) int -> rax
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [rsp+00H] "OutgoingArgSpace"
;
; Lcl frame size = 0
G_M37129_IG01: ;; offset=0000H
;; bbWeight=1 PerfScore 0.00
G_M37129_IG02: ;; offset=0000H
33C0 xor eax, eax
3BCA cmp ecx, edx
7E0E jle SHORT G_M37129_IG04
;; bbWeight=1 PerfScore 1.50
G_M37129_IG03: ;; offset=0006H
B801000000 mov eax, 1
3BD1 cmp edx, ecx
7C05 jl SHORT G_M37129_IG04
B803000000 mov eax, 3
;; bbWeight=0.50 PerfScore 0.88
G_M37129_IG04: ;; offset=0014H
C3 ret
;; bbWeight=1 PerfScore 1.00
; Total bytes of code 21, prolog size 0, PerfScore 5.48, instruction count 8, allocated bytes for code 21 (MethodHash=ddcb6ef6) for method X:F3(int,int):int
;;; diff
; Assembly listing for method X:F3(int,int):int
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 3 ) int -> rcx single-def
; V01 arg1 [V01,T01] ( 3, 3 ) int -> rdx single-def
; V02 loc0 [V02,T02] ( 3, 2.50) int -> rax
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [rsp+00H] "OutgoingArgSpace"
;
; Lcl frame size = 0
G_M37129_IG01: ;; offset=0000H
;; bbWeight=1 PerfScore 0.00
G_M37129_IG02: ;; offset=0000H
33C0 xor eax, eax
3BCA cmp ecx, edx
7E05 jle SHORT G_M37129_IG04
;; bbWeight=1 PerfScore 1.50
G_M37129_IG03: ;; offset=0006H
B801000000 mov eax, 1
;; bbWeight=0.50 PerfScore 0.12
G_M37129_IG04: ;; offset=000BH
C3 ret
;; bbWeight=1 PerfScore 1.00
; Total bytes of code 12, prolog size 0, PerfScore 3.83, instruction count 5, allocated bytes for code 12 (MethodHash=ddcb6ef6) for method X:F3(int,int):int |
Handle cases where the dominating compare is the reverse of the compare
we're trying to optimize. For example, if
(x > y)
dominates(y <= x)
we may be able to optimize the dominated compare.
Addresses aspects of #48115.