-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement GT_MUL_LONG
for ARM64
#57926
Conversation
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsThis change refactors a bunch of logic so that it can be used on ARM64 to recognize and emit The changes, in order:
The diffs are quite good:
|
GT_MUL_LONG
for arm64GT_MUL_LONG
for ARM64
This fixes #47490, right? |
Yep, will link it once the CI is green. |
cc @echesakovMSFT |
4882cde
to
1b38307
Compare
Contributes to #35853 |
Move it to a GenTree method so that it can be reused in lowering. No diffs except for a couple (~5 across all SPMI collection) on ARM due to the new operand swapping behavior.
Move it out of the "#if !defined(TARGET_64BIT)" block. Also some miscellaneous moving of comments around.
Note: this commit breaks magic division.
To use the new GT_MUL_LONG support. Plus some tweaks to make it conform to the coding guide. Credit to @pentp for the substantive changes.
1b38307
to
b04ce9e
Compare
b04ce9e
to
5336b72
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good (with some comments).
@SingleAccretion Can you please share the diffs on Arm64?
Do the diffs in the PR's description suffice? |
Sorry, I meant an example of assembly diffs (not the summary). |
I see. There are two types of diffs with this change. The first is a straight removal of two casts, with a sutiable multiplication instruction change: - mov w0, w0
- mov w1, w1
- mul x22, x0, x1
+ umull x22, w0, w1
- sxtw x2, w3
- mov x3, #60
- mul x2, x2, x3
+ mov w2, #60
+ smull x2, w3, w2 The second is the same, but with an added benefit of the multiplication being made non-overflowing: - mov w1, w2
- mov x0, #4
- umulh x5, x1, x0
- mul x1, x1, x0
- cmp x5, #0
- bne G_M31955_IG28
+ mov w1, #4
+ umull x1, w2, w1 // (ulong)w2 * (ulong)4 cannot overflow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great diffs! Thank you for the contribution @SingleAccretion!
cc @dotnet/jit-contrib
Improvement on arm64: dotnet/perf-autofiling-issues#1817 |
This change refactors a bunch of logic so that it can be used on ARM64 to recognize and emit
smull/umull
multiplies.The changes, in order:
Refactor the recognition logic so that it is idempotent. This commit is zero-diffs for 32 bit targets, as expected, except for a few small ones due to the new logic swapping operands before morphing to helper calls (and thus arguments to said helpers being assigned different registers).
Support for
GT_MUL_LONG
in ARM64 codegen and LSRA. This commit breaks magic division because it deletes support for 32x32->64 multiplies fromgenCodeForBinary
.Fixes for magic division, based on Fix ARM64 unsigned div by const perf regression #57372 plus some refactoring to make the code less
#ifdef
y and conforming to the code guide. This commit makes the change a zero-diff one for ARM64.Finally, implement the recognition in lowering and associated refactoring, from which the bulk of the diffs come from.
Said diffs are quite nice:
win-arm64
,linux-arm64
.Fixes #47490.