Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement GT_MUL_LONG for ARM64 #57926

Merged
merged 6 commits into from
Oct 5, 2021

Conversation

SingleAccretion
Copy link
Contributor

@SingleAccretion SingleAccretion commented Aug 23, 2021

This change refactors a bunch of logic so that it can be used on ARM64 to recognize and emit smull/umull multiplies.

The changes, in order:

  1. Refactor the recognition logic so that it is idempotent. This commit is zero-diffs for 32 bit targets, as expected, except for a few small ones due to the new logic swapping operands before morphing to helper calls (and thus arguments to said helpers being assigned different registers).

  2. Support for GT_MUL_LONG in ARM64 codegen and LSRA. This commit breaks magic division because it deletes support for 32x32->64 multiplies from genCodeForBinary.

  3. Fixes for magic division, based on Fix ARM64 unsigned div by const perf regression #57372 plus some refactoring to make the code less #ifdefy and conforming to the code guide. This commit makes the change a zero-diff one for ARM64.

  4. Finally, implement the recognition in lowering and associated refactoring, from which the bulk of the diffs come from.

Said diffs are quite nice: win-arm64, linux-arm64.

Fixes #47490.

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Aug 23, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 23, 2021
@ghost
Copy link

ghost commented Aug 23, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

This change refactors a bunch of logic so that it can be used on ARM64 to recognize and emit smull/umull multiplies.

The changes, in order:

  1. Refactor the recognition logic so that it is idempotent. This commit is zero-diffs for 32 bit targets, as expected, except for a few small ones due to the new logic swapping operands before morphing to helper calls (and thus arguments to said helpers being assigned different registers).

  2. Support for GT_MUL_LONG in ARM64 codegen and LSRA. This commit breaks magic division because it deletes support for 32x32->64 multiplies from genCodeForBinary.

  3. Fixes for magic division, based on Fix ARM64 unsigned div by const perf regression #57372 plus some refactoring to make the code less #ifdefy and conforming to the style guide. This commit makes the change a zero-diff one for ARM64.

  4. Finally, implement the recognition in lowering and associated refactoring, from which the bulk of the diffs come from.

The diffs are quite good: win-arm64.

Author: SingleAccretion
Assignees: -
Labels:

area-CodeGen-coreclr, community-contribution

Milestone: -

@SingleAccretion SingleAccretion changed the title Implement GT_MUL_LONG for arm64 Implement GT_MUL_LONG for ARM64 Aug 23, 2021
@pentp
Copy link
Contributor

pentp commented Aug 23, 2021

This fixes #47490, right?

@SingleAccretion
Copy link
Contributor Author

Yep, will link it once the CI is green.

@SingleAccretion SingleAccretion marked this pull request as ready for review August 23, 2021 12:36
@SingleAccretion
Copy link
Contributor Author

cc @echesakovMSFT

@pentp
Copy link
Contributor

pentp commented Sep 8, 2021

Contributes to #35853

Move it to a GenTree method so that it can be reused in lowering.

No diffs except for a couple (~5 across all SPMI collection) on
ARM due to the new operand swapping behavior.
Move it out of the "#if !defined(TARGET_64BIT)" block.

Also some miscellaneous moving of comments around.
Note: this commit breaks magic division.
To use the new GT_MUL_LONG support.
Plus some tweaks to make it conform to the coding guide.

Credit to @pentp for the substantive changes.
Copy link
Contributor

@echesakov echesakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good (with some comments).

@SingleAccretion Can you please share the diffs on Arm64?

src/coreclr/jit/codegen.h Outdated Show resolved Hide resolved
src/coreclr/jit/gentree.h Outdated Show resolved Hide resolved
@SingleAccretion
Copy link
Contributor Author

Can you please share the diffs on Arm64?

Do the diffs in the PR's description suffice?

@echesakov
Copy link
Contributor

Do the diffs in the PR's description suffice?

Sorry, I meant an example of assembly diffs (not the summary).

@SingleAccretion
Copy link
Contributor Author

Sorry, I meant an example of assembly diffs (not the summary).

I see. There are two types of diffs with this change. The first is a straight removal of two casts, with a sutiable multiplication instruction change:

-            mov     w0, w0
-            mov     w1, w1
-            mul     x22, x0, x1
+            umull   x22, w0, w1

-            sxtw    x2, w3
-            mov     x3, #60
-            mul     x2, x2, x3
+            mov     w2, #60
+            smull   x2, w3, w2

The second is the same, but with an added benefit of the multiplication being made non-overflowing:

-            mov     w1, w2
-            mov     x0, #4
-            umulh   x5, x1, x0
-            mul     x1, x1, x0
-            cmp     x5, #0
-            bne     G_M31955_IG28
+            mov     w1, #4
+            umull   x1, w2, w1 // (ulong)w2 * (ulong)4 cannot overflow.

Copy link
Contributor

@echesakov echesakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great diffs! Thank you for the contribution @SingleAccretion!

cc @dotnet/jit-contrib

@echesakov echesakov merged commit 220b746 into dotnet:main Oct 5, 2021
@SingleAccretion SingleAccretion deleted the Long-Mul-For-Arm64 branch October 6, 2021 13:38
@kunalspathak
Copy link
Member

Improvement on arm64: dotnet/perf-autofiling-issues#1817

@ghost ghost locked as resolved and limited conversation to collaborators Nov 13, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Arm64] Use smull/umull for computing 64-bit result of multiplication of two 32-bit ints/uint
5 participants