Implement `GT_MUL_LONG` for ARM64 #57926

SingleAccretion · 2021-08-23T10:01:16Z

This change refactors a bunch of logic so that it can be used on ARM64 to recognize and emit smull/umull multiplies.

The changes, in order:

Refactor the recognition logic so that it is idempotent. This commit is zero-diffs for 32 bit targets, as expected, except for a few small ones due to the new logic swapping operands before morphing to helper calls (and thus arguments to said helpers being assigned different registers).
Support for GT_MUL_LONG in ARM64 codegen and LSRA. This commit breaks magic division because it deletes support for 32x32->64 multiplies from genCodeForBinary.
Fixes for magic division, based on Fix ARM64 unsigned div by const perf regression #57372 plus some refactoring to make the code less #ifdefy and conforming to the code guide. This commit makes the change a zero-diff one for ARM64.
Finally, implement the recognition in lowering and associated refactoring, from which the bulk of the diffs come from.

Said diffs are quite nice: win-arm64, linux-arm64.

Fixes #47490.

ghost · 2021-08-23T10:01:23Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

This change refactors a bunch of logic so that it can be used on ARM64 to recognize and emit smull/umull multiplies.

The changes, in order:

Refactor the recognition logic so that it is idempotent. This commit is zero-diffs for 32 bit targets, as expected, except for a few small ones due to the new logic swapping operands before morphing to helper calls (and thus arguments to said helpers being assigned different registers).
Support for GT_MUL_LONG in ARM64 codegen and LSRA. This commit breaks magic division because it deletes support for 32x32->64 multiplies from genCodeForBinary.
Fixes for magic division, based on Fix ARM64 unsigned div by const perf regression #57372 plus some refactoring to make the code less #ifdefy and conforming to the style guide. This commit makes the change a zero-diff one for ARM64.
Finally, implement the recognition in lowering and associated refactoring, from which the bulk of the diffs come from.

The diffs are quite good: win-arm64.

Author:	SingleAccretion
Assignees:	-
Labels:	`area-CodeGen-coreclr`, `community-contribution`
Milestone:	-

pentp · 2021-08-23T10:06:11Z

This fixes #47490, right?

SingleAccretion · 2021-08-23T10:07:11Z

Yep, will link it once the CI is green.

src/coreclr/jit/lower.cpp

SingleAccretion · 2021-08-23T12:39:19Z

cc @echesakovMSFT

pentp · 2021-09-08T09:17:05Z

Contributes to #35853

Move it to a GenTree method so that it can be reused in lowering. No diffs except for a couple (~5 across all SPMI collection) on ARM due to the new operand swapping behavior.

Move it out of the "#if !defined(TARGET_64BIT)" block. Also some miscellaneous moving of comments around.

Note: this commit breaks magic division.

@pentp

To use the new GT_MUL_LONG support. Plus some tweaks to make it conform to the coding guide. Credit to @pentp for the substantive changes.

echesakov

The changes look good (with some comments).

@SingleAccretion Can you please share the diffs on Arm64?

src/coreclr/jit/codegen.h

src/coreclr/jit/gentree.h

SingleAccretion · 2021-10-05T14:53:58Z

Can you please share the diffs on Arm64?

Do the diffs in the PR's description suffice?

echesakov · 2021-10-05T15:38:05Z

Do the diffs in the PR's description suffice?

Sorry, I meant an example of assembly diffs (not the summary).

SingleAccretion · 2021-10-05T16:04:38Z

Sorry, I meant an example of assembly diffs (not the summary).

I see. There are two types of diffs with this change. The first is a straight removal of two casts, with a sutiable multiplication instruction change:

-            mov     w0, w0
-            mov     w1, w1
-            mul     x22, x0, x1
+            umull   x22, w0, w1

-            sxtw    x2, w3
-            mov     x3, #60
-            mul     x2, x2, x3
+            mov     w2, #60
+            smull   x2, w3, w2

The second is the same, but with an added benefit of the multiplication being made non-overflowing:

-            mov     w1, w2
-            mov     x0, #4
-            umulh   x5, x1, x0
-            mul     x1, x1, x0
-            cmp     x5, #0
-            bne     G_M31955_IG28
+            mov     w1, #4
+            umull   x1, w2, w1 // (ulong)w2 * (ulong)4 cannot overflow.

echesakov

Great diffs! Thank you for the contribution @SingleAccretion!

cc @dotnet/jit-contrib

kunalspathak · 2021-10-14T15:20:39Z

Improvement on arm64: dotnet/perf-autofiling-issues#1817

ghost added the community-contribution Indicates that the PR has been added by a community member label Aug 23, 2021

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 23, 2021

SingleAccretion changed the title ~~Implement GT_MUL_LONG for arm64~~ Implement GT_MUL_LONG for ARM64 Aug 23, 2021

pentp reviewed Aug 23, 2021

View reviewed changes

src/coreclr/jit/lower.cpp Show resolved Hide resolved

SingleAccretion marked this pull request as ready for review August 23, 2021 12:36

JulieLeeMSFT requested a review from echesakov August 23, 2021 19:37

JulieLeeMSFT assigned SingleAccretion Aug 23, 2021

JulieLeeMSFT added this to the 7.0.0 milestone Aug 23, 2021

SingleAccretion force-pushed the Long-Mul-For-Arm64 branch from 4882cde to 1b38307 Compare August 26, 2021 21:50

pentp mentioned this pull request Aug 27, 2021

Implement 128-bit Multiply intrinsic for x86/x64 #58263

Open

SingleAccretion added 4 commits October 1, 2021 22:30

Refactor LONG_MUL recognition

ff6057a

Move it to a GenTree method so that it can be reused in lowering. No diffs except for a couple (~5 across all SPMI collection) on ARM due to the new operand swapping behavior.

Fix the definition for MUL_LONG

47f3e0b

Move it out of the "#if !defined(TARGET_64BIT)" block. Also some miscellaneous moving of comments around.

Support MUL_LONG for ARM64 in codegen and LSRA

3fcc830

Note: this commit breaks magic division.

Fix magic division

537f632

To use the new GT_MUL_LONG support. Plus some tweaks to make it conform to the coding guide. Credit to @pentp for the substantive changes.

SingleAccretion force-pushed the Long-Mul-For-Arm64 branch from 1b38307 to b04ce9e Compare October 1, 2021 19:40

Recognize MUL_LONG in lowering for ARM64

5336b72

SingleAccretion force-pushed the Long-Mul-For-Arm64 branch from b04ce9e to 5336b72 Compare October 1, 2021 20:09

JulieLeeMSFT assigned echesakov Oct 4, 2021

echesakov reviewed Oct 5, 2021

View reviewed changes

src/coreclr/jit/codegen.h Outdated Show resolved Hide resolved

src/coreclr/jit/gentree.h Outdated Show resolved Hide resolved

Fix #ifdef's

8a7932e

echesakov approved these changes Oct 5, 2021

View reviewed changes

echesakov merged commit 220b746 into dotnet:main Oct 5, 2021

SingleAccretion deleted the Long-Mul-For-Arm64 branch October 6, 2021 13:38

ghost locked as resolved and limited conversation to collaborators Nov 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `GT_MUL_LONG` for ARM64 #57926

Implement `GT_MUL_LONG` for ARM64 #57926

SingleAccretion commented Aug 23, 2021 •

edited

Loading

ghost commented Aug 23, 2021

pentp commented Aug 23, 2021

SingleAccretion commented Aug 23, 2021

SingleAccretion commented Aug 23, 2021

pentp commented Sep 8, 2021

echesakov left a comment

SingleAccretion commented Oct 5, 2021

echesakov commented Oct 5, 2021

SingleAccretion commented Oct 5, 2021

echesakov left a comment

kunalspathak commented Oct 14, 2021

Implement GT_MUL_LONG for ARM64 #57926

Implement GT_MUL_LONG for ARM64 #57926

Conversation

SingleAccretion commented Aug 23, 2021 • edited Loading

ghost commented Aug 23, 2021

pentp commented Aug 23, 2021

SingleAccretion commented Aug 23, 2021

SingleAccretion commented Aug 23, 2021

pentp commented Sep 8, 2021

echesakov left a comment

Choose a reason for hiding this comment

SingleAccretion commented Oct 5, 2021

echesakov commented Oct 5, 2021

SingleAccretion commented Oct 5, 2021

echesakov left a comment

Choose a reason for hiding this comment

kunalspathak commented Oct 14, 2021

Implement `GT_MUL_LONG` for ARM64 #57926

Implement `GT_MUL_LONG` for ARM64 #57926

SingleAccretion commented Aug 23, 2021 •

edited

Loading