-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Faster comparison against Vector128<>.Zero #63829
Comments
Tagging subscribers to this area: @JulieLeeMSFT Issue Detailsbool IsZero(Vector128<byte> v) => v == Vector128<byte>.Zero; Currently is suboptimal on ARM64: while I'd expect it to emit something like this: so if one of the args of runtime/src/coreclr/jit/lowerarmarch.cpp Line 864 in 386f871
|
👀 looking, interested |
@EgorBo what tool is that? |
The code is slightly better after #62933 ; Assembly listing for method Program:<<Main>$>g__IsZero|0_0(System.Runtime.Intrinsics.Vector128`1[Byte]):bool
; Emitting BLENDED_CODE for generic ARM64 CPU - MacOS
; optimized code
; fp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16) single-def
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+00H] "OutgoingArgSpace"
;
; Lcl frame size = 0
G_M25750_IG01: ;; offset=0000H
00000000 stp fp, lr, [sp,#-16]!
00000000 mov fp, sp
;; bbWeight=1 PerfScore 1.50
G_M25750_IG02: ;; offset=0008H
00000000 cmeq v16.16b, v0.16b, #0
00000000 uminv b16, v16.16b
00000000 umov w0, v16.b[0]
00000000 cmp w0, #0
00000000 cset x0, ne
;; bbWeight=1 PerfScore 6.00
G_M25750_IG03: ;; offset=001CH
00000000 ldp fp, lr, [sp],#16
00000000 ret lr
;; bbWeight=1 PerfScore 2.00
; Total bytes of code 36, prolog size 8, PerfScore 13.10, instruction count 9, allocated bytes for code 36 (MethodHash=80c69b69) for method Program:<<Main>$>g__IsZero|0_0(System.Runtime.Intrinsics.Vector128`1[Byte]):bool
; ============================================================ cc @TIHan |
A good first issue for those who are interested in SIMD and ARM64. Should improve #63285
Currently is suboptimal on ARM64:
while I'd expect it to emit something like this:
so if one of the args of
==
is _Zero it needs to useumaxv
, it should be done here:runtime/src/coreclr/jit/lowerarmarch.cpp
Line 864 in 386f871
The text was updated successfully, but these errors were encountered: