-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ARM64 unsigned div by const perf regression #57372
Conversation
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsThis fixes #54166. I added a new Diff: https://www.diffchecker.com/9mkVf4se
|
// Generate code to get the 64-bits of a 32*32 bit multiplication result | ||
void CodeGen::genCodeForMulWide(GenTreeOp* treeNode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this a GT_MUL_LONG
? We only use for 32 bit targets now, but it should be straightforward to use it on ARM64 too.
cc @dotnet/jit-contrib. Can somebody more familiar with arm64 codegen PTAL. It would be nice to verify this fixes the regression, so likely one of us will need to be on tap to run the benchmarks. Anyone want to sign up to shepherd this fix?
I like the sound of this. Is it something you can work up reasonably quickly so we can assess how complicated it would be and whether it looks like a more attractive alternative? |
@echesakovMSFT PTAL. |
I agree that having
would be better that introducing an IR node that only gonna be used for a few cases like this one. I will run the affected benchmarks on Arm64 device on Monday and post results here. |
Heh, I had a prototype for #47490 where I also just removed casts and changed types like this:
to:
and recognized umull/smull in the codegen for it |
We already have all the necessary logic to recognize such trees in morph, but it may need some refactoring to be usable in LIR (I do not think we want it in morph for ARM64, I think we want it in lowering). However, if you want the easy way out, there is a "dumbed down" debug-only (but still correct) version that will work on LIR as-is here (modulo the actual flag check - it can just be inlined for the 32 bit-only |
This fixes #54166. I added a new
GT_MULWIDE
node for this, but then later realized that we should actually try to fix theGT_MUL
arm64 codegen for such cases where both operands are 32-bit as a general solution that would benefit for exampleSystem.Decimal
also (it does a lot of such multiplications). What do you think @AndyAyersMS?Diff: https://www.diffchecker.com/9mkVf4se
Diff with baseline (no #52893): https://www.diffchecker.com/x2SRZJvW