[AArch64] Avoid NEON ORR when NEON and SVE are unavailable #93940

sdesmalen-arm · 2024-05-31T08:58:52Z

For streaming-compatible functions with only +sme, we can't use
a NEON ORR (aliased as 'mov') for copies of Q-registers, so
we need to use a spill/fill instead.

This also fixes the fill, which should use the post-incrementing
addressing mode.

For streaming-compatible functions with only +sme, we can't use a NEON ORR (aliased as 'mov') for copies of Q-registers, so we need to use a spill/fill instead. This also fixes the fill, which should use the post-incrementing addressing mode.

llvmbot · 2024-05-31T08:59:23Z

@llvm/pr-subscribers-backend-aarch64

Author: Sander de Smalen (sdesmalen-arm)

Changes

For streaming-compatible functions with only +sme, we can't use
a NEON ORR (aliased as 'mov') for copies of Q-registers, so
we need to use a spill/fill instead.

This also fixes the fill, which should use the post-incrementing
addressing mode.

Full diff: https://github.com/llvm/llvm-project/pull/93940.diff

4 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64InstrInfo.cpp (+2-2)
(modified) llvm/test/CodeGen/AArch64/arm64-reg-copy-noneon.ll (+16-9)
(modified) llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-masked-load.ll (+8-4)
(modified) llvm/test/CodeGen/AArch64/sve-streaming-mode-test-register-mov.ll (+2-1)

diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
index aa0b7c93f8661..ce4b7e68fd625 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
@@ -4659,7 +4659,7 @@ void AArch64InstrInfo::copyPhysReg(MachineBasicBlock &MBB,
           .addReg(AArch64::Z0 + (DestReg - AArch64::Q0), RegState::Define)
           .addReg(AArch64::Z0 + (SrcReg - AArch64::Q0))
           .addReg(AArch64::Z0 + (SrcReg - AArch64::Q0));
-    else if (Subtarget.hasNEON())
+    else if (Subtarget.isNeonAvailable())
       BuildMI(MBB, I, DL, get(AArch64::ORRv16i8), DestReg)
           .addReg(SrcReg)
           .addReg(SrcReg, getKillRegState(KillSrc));
@@ -4669,7 +4669,7 @@ void AArch64InstrInfo::copyPhysReg(MachineBasicBlock &MBB,
           .addReg(SrcReg, getKillRegState(KillSrc))
           .addReg(AArch64::SP)
           .addImm(-16);
-      BuildMI(MBB, I, DL, get(AArch64::LDRQpre))
+      BuildMI(MBB, I, DL, get(AArch64::LDRQpost))
           .addReg(AArch64::SP, RegState::Define)
           .addReg(DestReg, RegState::Define)
           .addReg(AArch64::SP)
diff --git a/llvm/test/CodeGen/AArch64/arm64-reg-copy-noneon.ll b/llvm/test/CodeGen/AArch64/arm64-reg-copy-noneon.ll
index 29255ef187c1c..69cd295c309d1 100644
--- a/llvm/test/CodeGen/AArch64/arm64-reg-copy-noneon.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-reg-copy-noneon.ll
@@ -1,20 +1,27 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
 ; RUN: llc -mtriple=arm64-none-linux-gnu -mattr=-neon < %s | FileCheck %s
 
 define float @copy_FPR32(float %a, float %b) {
-;CHECK-LABEL: copy_FPR32:
-;CHECK: fmov s0, s1
+; CHECK-LABEL: copy_FPR32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fmov s0, s1
+; CHECK-NEXT:    ret
   ret float %b;
 }
-  
+
 define double @copy_FPR64(double %a, double %b) {
-;CHECK-LABEL: copy_FPR64:
-;CHECK: fmov d0, d1
+; CHECK-LABEL: copy_FPR64:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fmov d0, d1
+; CHECK-NEXT:    ret
   ret double %b;
 }
-  
+
 define fp128 @copy_FPR128(fp128 %a, fp128 %b) {
-;CHECK-LABEL: copy_FPR128:
-;CHECK: str	q1, [sp, #-16]!
-;CHECK-NEXT: ldr	q0, [sp, #16]!
+; CHECK-LABEL: copy_FPR128:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    str q1, [sp, #-16]!
+; CHECK-NEXT:    ldr q0, [sp], #16
+; CHECK-NEXT:    ret
   ret fp128 %b;
 }
diff --git a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-masked-load.ll b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-masked-load.ll
index be335c697707d..a689a539b0082 100644
--- a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-masked-load.ll
+++ b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-masked-load.ll
@@ -980,7 +980,8 @@ define <32 x i8> @masked_load_v32i8(ptr %src, <32 x i1> %mask) {
 ; NONEON-NOSVE-NEXT:    tbnz w8, #1, .LBB3_3
 ; NONEON-NOSVE-NEXT:    b .LBB3_4
 ; NONEON-NOSVE-NEXT:  .LBB3_2:
-; NONEON-NOSVE-NEXT:    mov v0.16b, v1.16b
+; NONEON-NOSVE-NEXT:    str q1, [sp, #-16]!
+; NONEON-NOSVE-NEXT:    ldr q0, [sp], #16
 ; NONEON-NOSVE-NEXT:    tbz w8, #1, .LBB3_4
 ; NONEON-NOSVE-NEXT:  .LBB3_3: // %cond.load1
 ; NONEON-NOSVE-NEXT:    ldrb w10, [x0, #1]
@@ -2095,7 +2096,8 @@ define <16 x half> @masked_load_v16f16(ptr %src, <16 x i1> %mask) {
 ; NONEON-NOSVE-NEXT:    tbnz w8, #1, .LBB7_3
 ; NONEON-NOSVE-NEXT:    b .LBB7_4
 ; NONEON-NOSVE-NEXT:  .LBB7_2:
-; NONEON-NOSVE-NEXT:    mov v0.16b, v1.16b
+; NONEON-NOSVE-NEXT:    str q1, [sp, #-16]!
+; NONEON-NOSVE-NEXT:    ldr q0, [sp], #16
 ; NONEON-NOSVE-NEXT:    tbz w8, #1, .LBB7_4
 ; NONEON-NOSVE-NEXT:  .LBB7_3: // %cond.load1
 ; NONEON-NOSVE-NEXT:    ldr h2, [x0, #2]
@@ -2616,7 +2618,8 @@ define <8 x float> @masked_load_v8f32(ptr %src, <8 x i1> %mask) {
 ; NONEON-NOSVE-NEXT:    tbnz w8, #1, .LBB10_3
 ; NONEON-NOSVE-NEXT:    b .LBB10_4
 ; NONEON-NOSVE-NEXT:  .LBB10_2:
-; NONEON-NOSVE-NEXT:    mov v0.16b, v1.16b
+; NONEON-NOSVE-NEXT:    str q1, [sp, #-16]!
+; NONEON-NOSVE-NEXT:    ldr q0, [sp], #16
 ; NONEON-NOSVE-NEXT:    tbz w8, #1, .LBB10_4
 ; NONEON-NOSVE-NEXT:  .LBB10_3: // %cond.load1
 ; NONEON-NOSVE-NEXT:    ldr s2, [x0, #4]
@@ -2839,7 +2842,8 @@ define <4 x double> @masked_load_v4f64(ptr %src, <4 x i1> %mask) {
 ; NONEON-NOSVE-NEXT:    tbnz w8, #1, .LBB12_3
 ; NONEON-NOSVE-NEXT:    b .LBB12_4
 ; NONEON-NOSVE-NEXT:  .LBB12_2:
-; NONEON-NOSVE-NEXT:    mov v0.16b, v1.16b
+; NONEON-NOSVE-NEXT:    str q1, [sp, #-16]!
+; NONEON-NOSVE-NEXT:    ldr q0, [sp], #16
 ; NONEON-NOSVE-NEXT:    tbz w8, #1, .LBB12_4
 ; NONEON-NOSVE-NEXT:  .LBB12_3: // %cond.load1
 ; NONEON-NOSVE-NEXT:    ldr d2, [x0, #8]
diff --git a/llvm/test/CodeGen/AArch64/sve-streaming-mode-test-register-mov.ll b/llvm/test/CodeGen/AArch64/sve-streaming-mode-test-register-mov.ll
index 67cdde718e391..23adb1a4bc092 100644
--- a/llvm/test/CodeGen/AArch64/sve-streaming-mode-test-register-mov.ll
+++ b/llvm/test/CodeGen/AArch64/sve-streaming-mode-test-register-mov.ll
@@ -15,7 +15,8 @@ define fp128 @test_streaming_compatible_register_mov(fp128 %q0, fp128 %q1) {
 ;
 ; NONEON-NOSVE-LABEL: test_streaming_compatible_register_mov:
 ; NONEON-NOSVE:       // %bb.0:
-; NONEON-NOSVE-NEXT:    mov v0.16b, v1.16b
+; NONEON-NOSVE-NEXT:    str q1, [sp, #-16]!
+; NONEON-NOSVE-NEXT:    ldr q0, [sp], #16
 ; NONEON-NOSVE-NEXT:    ret
   ret fp128 %q1
 }

efriedma-quic

The code sequence here assumes there's no red zone; is that actually enforced somewhere? Can we allocate a stack slot instead of modifying the stack pointer?

As an alternative, it might be possible to use an integer register here, but I guess allocating that might be a bit complicated.

efriedma-quic · 2024-06-09T21:33:38Z

Ping

sdesmalen-arm · 2024-06-10T10:55:18Z

The code sequence here assumes there's no red zone; is that actually enforced somewhere? Can we allocate a stack slot instead of modifying the stack pointer?

As an alternative, it might be possible to use an integer register here, but I guess allocating that might be a bit complicated.

Good spot!

From what I can see, it seems better to just disable the Red Zone entirely for now.

The issue is that the way things currently work, the COPY is only expanded post register-allocation and post prologue-epilogue-insertion. Assuming we'd tell PEI to allocate a stack slot, then in the copyPhysReg function we'd need to emit the base + offset manually. Then, if the slot is allocated too far away from either the FP/SP, we'd need the Register Scavenger to either give us an extra register or spill a register to access it, but the Register Scavenger is not passed to copyPhysReg. I guess we could implement this mechanism manually using the emergency spill slot (or allocating a larger emergency slot for the spill/fill purpose), or as you suggest using a GPR register (for which we'd still need to do some sort of scavenging for free registers). Alternatively, we could also create another post-RA expand pass that runs before PEI, to let PEI do the heavy lifting.

Fixing the problem with a stop-gap (disabling red zone) could be a good first step. And if this turns out to be a genuine performance issue, we can come up with a better mechanism later. Does that make sense?

This was pointed out in PR llvm#93940.

efriedma-quic · 2024-06-10T15:39:31Z

That makes sense, sure.

…94962) This was pointed out in PR #93940.

…lvm#94962) This was pointed out in PR llvm#93940.

sdesmalen-arm added 2 commits May 31, 2024 09:52

[AArch64] NFC: Use update_llc_test_checks for arm64-reg-copy-noneon.ll

4204e6f

sdesmalen-arm requested review from aemerson, paulwalker-arm and kmclaughlin-arm May 31, 2024 08:58

llvmbot added the backend:AArch64 label May 31, 2024

kmclaughlin-arm approved these changes May 31, 2024

View reviewed changes

sdesmalen-arm merged commit b71434f into llvm:main Jun 3, 2024
9 checks passed

efriedma-quic reviewed Jun 3, 2024

View reviewed changes

sdesmalen-arm added a commit to sdesmalen-arm/llvm-project that referenced this pull request Jun 10, 2024

[AArch64] Disable red-zone when lowering Q-reg copy through memory.

8e2a8c2

This was pointed out in PR llvm#93940.

sdesmalen-arm mentioned this pull request Jun 10, 2024

[AArch64] Disable red-zone when lowering Q-reg copy through memory. #94962

Merged

sdesmalen-arm added a commit that referenced this pull request Jun 11, 2024

[AArch64] Disable red-zone when lowering Q-reg copy through memory. (#…

c63a622

…94962) This was pointed out in PR #93940.

Lukacma pushed a commit to Lukacma/llvm-project that referenced this pull request Jun 12, 2024

[AArch64] Disable red-zone when lowering Q-reg copy through memory. (l…

677a014

…lvm#94962) This was pointed out in PR llvm#93940.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AArch64] Avoid NEON ORR when NEON and SVE are unavailable #93940

[AArch64] Avoid NEON ORR when NEON and SVE are unavailable #93940

sdesmalen-arm commented May 31, 2024

llvmbot commented May 31, 2024

efriedma-quic left a comment

efriedma-quic commented Jun 9, 2024

sdesmalen-arm commented Jun 10, 2024

efriedma-quic commented Jun 10, 2024

[AArch64] Avoid NEON ORR when NEON and SVE are unavailable #93940

[AArch64] Avoid NEON ORR when NEON and SVE are unavailable #93940

Conversation

sdesmalen-arm commented May 31, 2024

llvmbot commented May 31, 2024

efriedma-quic left a comment

Choose a reason for hiding this comment

efriedma-quic commented Jun 9, 2024

sdesmalen-arm commented Jun 10, 2024

efriedma-quic commented Jun 10, 2024