-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating a few BitConverter APIs to be intrinsic #71567
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsThis helps cleanup and improve codegen for this core method to help with a few generic math related scenarios.
|
A local
The majority of wins are simply case like: - vmovaps xmm1, xmm7
- vmovd rcx, xmm1
+ vmovd rcx, xmm7 There are many cases where we do less inlining and therefore create less local variable assignments and such (this is one of the smaller diffs, others have 6 or more local variables removed): -; 0 inlinees with PGO data; 2 single block inlinees; 1 inlinees without PGO data
+; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
; V00 this [V00,T00] ( 3, 3 ) byref -> rcx this single-def
;* V01 loc0 [V01 ] ( 0, 0 ) int -> zero-ref
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [rsp+00H] "OutgoingArgSpace"
-;* V03 tmp1 [V03 ] ( 0, 0 ) float -> zero-ref "Inlining Arg"
-;* V04 tmp2 [V04 ] ( 0, 0 ) int -> zero-ref "Inline return value spill temp"
-;* V05 tmp3 [V05 ] ( 0, 0 ) float -> zero-ref ld-addr-op "Inlining Arg"
-;* V06 tmp4 [V06 ] ( 0, 0 ) simd16 -> zero-ref "Inline stloc first use temp"
;
; Lcl frame size = 0 In user and test code there are also places where Regressions look to largely be cases where we optimize more/differently such as by unrolling or cloning loops. There does look to be one case where we might be missing a containment check and I'll see if I can fix that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VN support needs a bit of restructuring.
Will this fix #11413? |
return *((long*)&value); | ||
} | ||
[Intrinsic] | ||
public static unsafe long DoubleToInt64Bits(double value) => *((long*)&value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the removal of the SSE2 path have any negative impact when using mono?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly for Mono JIT. Mono LLVM correctly handles this as a bitcast already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mono JIT doesn't support those at all. LLVM will be 100% fine without it 🙂
I'm curious about the general approach of making these intrinsics. Is this just working around other JIT limitations today? |
Co-authored-by: SingleAccretion <62474226+SingleAccretion@users.noreply.github.com>
Co-authored-by: SingleAccretion <62474226+SingleAccretion@users.noreply.github.com>
In this case, it is working around a JIT limitation in recognizing, dealing with, and optimizing the "trivial" pattern of However, there are many cases, even in C/C++ and other native compilers where intrinsics are provided even where some general pattern is recognized and supported. Even in the scenario where this was being handled by other intrinsics, the diff I linked above shows examples of where this introduced a larger number of locals and forced the JIT to do "more work" to accomplish the same thing. This means more time spent compiling, more risk that we run into the locals, CSE, and other limits in the JIT, and more. So having this "core" function be intrinsic anyways helps the JIT even where it otherwise handled the existing pattern well. -- This being "core" as its a piece of functionality many runtimes and libraries provide and getting the underlying bits for a floating-point value being "central" to implementing and dealing with floating-point types compliantly. These ones being intrinsic means that we can get better codegen and throughput for most |
… normal value now
This should be ready for review now. Will post updated SPMI diffs after CI completes |
SPMI says 0 diffs. Manually running
|
CC. @dotnet/jit-contrib for review |
Failure is the #71684 that was already resolved |
This helps cleanup and improve codegen for this core method to help with a few generic math related scenarios.
It does so by importing
BitConverter.DoubleToInt64Bits
,BitConverter.Int32BitsToSingle
,BitConverter.Int64BitsToDouble
, andBitConverter.SingleToInt32Bits
as intrinsicGT_BITCAST
operations and adding minimal support (such as value numbering) for these operations in the front end.