[RISCV] Adjust LMUL if not used to avoid VL toggle #69259

lukel97 · 2023-10-16T22:38:08Z

A common pattern with vmv.s.x is that we need to set VL afterwards because the SEW/LMUL ratio has changed, and thus VLMAX has changed:

vsetvli zero, a1, e64, m1, ta, ma
vmv.s.x v16, a0
vsetvli zero, a1, e32, m2, ta, ma

However since LMUL and the SEW/LMUL ratio are ignored by vmv.s.x, we can avoid a VL toggle in the second vsetvli instruction by adjusting LMUL so that the SEW/LMUL ratio remains the same between the two instructions:

vsetvli zero, a1, e64, m4, ta, ma
vmv.s.x v16, a0
vsetvli zero, zero, e32, m2, ta, ma

Avoiding a VL toggle may be more performant on some architectures, and in some cases allows a vsetvli to be deleted.

llvmbot · 2023-10-16T22:39:17Z

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

A common pattern with vmv.s.x is that we need to set VL afterwards because the SEW/LMUL ratio has changed, and thus VLMAX has changed:

vsetvli zero, a1, e64, m1, ta, ma
vmv.s.x v16, a0
vsetvli zero, a1, e32, m2, ta, ma

However since LMUL and the SEW/LMUL ratio are ignored by vmv.s.x, we can avoid a VL toggle in the second vsetvli instruction by adjusting LMUL so that the SEW/LMUL ratio remains the same between the two instructions:

vsetvli zero, a1, e64, m4, ta, ma
vmv.s.x v16, a0
vsetvli zero, zero, e32, m2, ta, ma

Avoiding a VL toggle may be more performant on some architectures, and in some cases allows a vsetvli to be deleted.

Patch is 46.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/69259.diff

8 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp (+58)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert.ll (+4-6)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-scatter.ll (+36-36)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-fp.ll (+29-30)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-int.ll (+40-44)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-unaligned.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/vreductions-fp-sdnode.ll (+12-12)
(modified) llvm/test/CodeGen/RISCV/rvv/vreductions-int.ll (+12-12)

diff --git a/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp b/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
index 5584fa8d503dbe4..6724acab277b5a5 100644
--- a/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
@@ -1463,6 +1463,55 @@ static bool canMutatePriorConfig(const MachineInstr &PrevMI,
   return areCompatibleVTYPEs(PriorVType, VType, Used);
 }
 
+// If LMUL or the SEW/LMUL ratio aren't demanded and MI and NextMI have the same
+// AVL, then we can try and change MI's LMUL so that we can avoid setting VL in
+// NextMI, e.g:
+//
+// vsetivli  zero, 4, e32, m1, ta, ma
+// vsetivli  zero, 4, e16, mf4, ta, ma
+//
+// vsetivli  zero, 4, e32, mf2, ta, ma
+// vsetvli   zero, zero, e16, mf4, ta, ma
+//
+// If possible, returns the new VTYPE that should be used for MI.
+static std::optional<unsigned>
+canAdjustSEWLMULRatio(const MachineInstr &MI, const MachineInstr &NextMI,
+                      const DemandedFields &Used) {
+  if (Used.LMUL || Used.SEWLMULRatio)
+    return std::nullopt;
+  if (!NextMI.getOperand(0).isDead())
+    return std::nullopt;
+  // If we end up increasing the SEW/LMUL ratio, then we will decrease VLMAX,
+  // which means we might end up changing VL in the case that AVL > VLMAX. So
+  // bail if the exact VL value is needed.
+  //
+  // TODO: We could potentially relax this when we know we're increasing VLMAX.
+  if (Used.VLAny)
+    return std::nullopt;
+
+  // If NextMI is already zero, zero then bail. If MI is zero, zero then we
+  // won't be able to tell if it has the same AVL as NextMI, so also bail.
+  if (isVLPreservingConfig(MI) || isVLPreservingConfig(NextMI))
+    return std::nullopt;
+
+  VSETVLIInfo NextMIInfo = getInfoForVSETVLI(NextMI);
+  VSETVLIInfo MIInfo = getInfoForVSETVLI(MI);
+  if (!MIInfo.hasSameAVL(NextMIInfo))
+    return std::nullopt;
+
+  unsigned SEW = MIInfo.getSEW() * 8;
+  // Fixed point value with 3 fractional bits.
+  unsigned NewRatio = SEW / NextMIInfo.getSEWLMULRatio();
+  bool Fractional = NewRatio < 8;
+  RISCVII::VLMUL NewVLMul = RISCVVType::encodeLMUL(
+      Fractional ? 8 / NewRatio : NewRatio / 8, Fractional);
+
+  unsigned VType = MIInfo.encodeVTYPE();
+  return RISCVVType::encodeVTYPE(NewVLMul, SEW / 8,
+                                 RISCVVType::isTailAgnostic(VType),
+                                 RISCVVType::isMaskAgnostic(VType));
+}
+
 void RISCVInsertVSETVLI::doLocalPostpass(MachineBasicBlock &MBB) {
   MachineInstr *NextMI = nullptr;
   // We can have arbitrary code in successors, so VL and VTYPE
@@ -1484,6 +1533,15 @@ void RISCVInsertVSETVLI::doLocalPostpass(MachineBasicBlock &MBB) {
       Used.demandVL();
 
     if (NextMI) {
+      if (auto NewVType = canAdjustSEWLMULRatio(MI, *NextMI, Used)) {
+        MI.getOperand(2).setImm(*NewVType);
+        // Convert NextMI to vsetvli zero, zero
+        NextMI->setDesc(TII->get(RISCV::PseudoVSETVLIX0));
+        NextMI->getOperand(0).setReg(RISCV::X0);
+        NextMI->getOperand(0).setIsDead(true);
+        NextMI->getOperand(1).ChangeToRegister(RISCV::X0, false, false, true);
+      }
+
       if (!Used.usedVL() && !Used.usedVTYPE()) {
         ToDelete.push_back(&MI);
         // Leave NextMI unchanged
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert.ll
index cbcca9d2696f4ba..4a39bbb519cc017 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert.ll
@@ -65,9 +65,8 @@ define <32 x i32> @insertelt_v32i32_31(<32 x i32> %a, i32 %y) {
 ; CHECK-LABEL: insertelt_v32i32_31:
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    li a1, 32
-; CHECK-NEXT:    vsetvli zero, a1, e32, m1, ta, ma
-; CHECK-NEXT:    vmv.s.x v16, a0
 ; CHECK-NEXT:    vsetvli zero, a1, e32, m8, ta, ma
+; CHECK-NEXT:    vmv.s.x v16, a0
 ; CHECK-NEXT:    vslideup.vi v8, v16, 31
 ; CHECK-NEXT:    ret
   %b = insertelement <32 x i32> %a, i32 %y, i32 31
@@ -103,9 +102,8 @@ define <64 x i32> @insertelt_v64i32_63(<64 x i32> %a, i32 %y) {
 ; CHECK-LABEL: insertelt_v64i32_63:
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    li a1, 32
-; CHECK-NEXT:    vsetvli zero, a1, e32, m1, ta, ma
-; CHECK-NEXT:    vmv.s.x v24, a0
 ; CHECK-NEXT:    vsetvli zero, a1, e32, m8, ta, ma
+; CHECK-NEXT:    vmv.s.x v24, a0
 ; CHECK-NEXT:    vslideup.vi v16, v24, 31
 ; CHECK-NEXT:    ret
   %b = insertelement <64 x i32> %a, i32 %y, i32 63
@@ -550,9 +548,9 @@ define void @insertelt_c6_v8i64_0_add(ptr %x, ptr %y) {
 ; CHECK-NEXT:    vsetivli zero, 8, e64, m4, ta, ma
 ; CHECK-NEXT:    vle64.v v8, (a0)
 ; CHECK-NEXT:    li a2, 6
-; CHECK-NEXT:    vsetivli zero, 8, e64, m1, tu, ma
+; CHECK-NEXT:    vsetivli zero, 8, e64, m4, tu, ma
 ; CHECK-NEXT:    vmv.s.x v8, a2
-; CHECK-NEXT:    vsetivli zero, 8, e64, m4, ta, ma
+; CHECK-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
 ; CHECK-NEXT:    vle64.v v12, (a1)
 ; CHECK-NEXT:    vadd.vv v8, v8, v12
 ; CHECK-NEXT:    vse64.v v8, (a0)
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-scatter.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-scatter.ll
index 60b61e889315cfe..06fb0a8eba43367 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-scatter.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-scatter.ll
@@ -1138,11 +1138,11 @@ define void @mscatter_baseidx_v8i8_v8i16(<8 x i16> %val, ptr %base, <8 x i8> %id
 ; RV64ZVE32F-NEXT:    andi a2, a1, 16
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB18_7
 ; RV64ZVE32F-NEXT:  .LBB18_14: # %cond.store7
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e8, mf4, ta, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 1, e8, mf2, ta, ma
 ; RV64ZVE32F-NEXT:    vmv.x.s a2, v10
 ; RV64ZVE32F-NEXT:    slli a2, a2, 1
 ; RV64ZVE32F-NEXT:    add a2, a0, a2
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e16, m1, ta, ma
+; RV64ZVE32F-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; RV64ZVE32F-NEXT:    vslidedown.vi v9, v8, 4
 ; RV64ZVE32F-NEXT:    vse16.v v9, (a2)
 ; RV64ZVE32F-NEXT:    andi a2, a1, 32
@@ -1271,11 +1271,11 @@ define void @mscatter_baseidx_sext_v8i8_v8i16(<8 x i16> %val, ptr %base, <8 x i8
 ; RV64ZVE32F-NEXT:    andi a2, a1, 16
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB19_7
 ; RV64ZVE32F-NEXT:  .LBB19_14: # %cond.store7
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e8, mf4, ta, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 1, e8, mf2, ta, ma
 ; RV64ZVE32F-NEXT:    vmv.x.s a2, v10
 ; RV64ZVE32F-NEXT:    slli a2, a2, 1
 ; RV64ZVE32F-NEXT:    add a2, a0, a2
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e16, m1, ta, ma
+; RV64ZVE32F-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; RV64ZVE32F-NEXT:    vslidedown.vi v9, v8, 4
 ; RV64ZVE32F-NEXT:    vse16.v v9, (a2)
 ; RV64ZVE32F-NEXT:    andi a2, a1, 32
@@ -1408,12 +1408,12 @@ define void @mscatter_baseidx_zext_v8i8_v8i16(<8 x i16> %val, ptr %base, <8 x i8
 ; RV64ZVE32F-NEXT:    andi a2, a1, 16
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB20_7
 ; RV64ZVE32F-NEXT:  .LBB20_14: # %cond.store7
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e8, mf4, ta, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 1, e8, mf2, ta, ma
 ; RV64ZVE32F-NEXT:    vmv.x.s a2, v10
 ; RV64ZVE32F-NEXT:    andi a2, a2, 255
 ; RV64ZVE32F-NEXT:    slli a2, a2, 1
 ; RV64ZVE32F-NEXT:    add a2, a0, a2
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e16, m1, ta, ma
+; RV64ZVE32F-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; RV64ZVE32F-NEXT:    vslidedown.vi v9, v8, 4
 ; RV64ZVE32F-NEXT:    vse16.v v9, (a2)
 ; RV64ZVE32F-NEXT:    andi a2, a1, 32
@@ -2043,11 +2043,11 @@ define void @mscatter_baseidx_v8i8_v8i32(<8 x i32> %val, ptr %base, <8 x i8> %id
 ; RV64ZVE32F-NEXT:    andi a2, a1, 16
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB29_7
 ; RV64ZVE32F-NEXT:  .LBB29_14: # %cond.store7
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e8, mf4, ta, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 1, e8, mf2, ta, ma
 ; RV64ZVE32F-NEXT:    vmv.x.s a2, v11
 ; RV64ZVE32F-NEXT:    slli a2, a2, 2
 ; RV64ZVE32F-NEXT:    add a2, a0, a2
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e32, m2, ta, ma
+; RV64ZVE32F-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; RV64ZVE32F-NEXT:    vslidedown.vi v12, v8, 4
 ; RV64ZVE32F-NEXT:    vse32.v v12, (a2)
 ; RV64ZVE32F-NEXT:    andi a2, a1, 32
@@ -2175,11 +2175,11 @@ define void @mscatter_baseidx_sext_v8i8_v8i32(<8 x i32> %val, ptr %base, <8 x i8
 ; RV64ZVE32F-NEXT:    andi a2, a1, 16
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB30_7
 ; RV64ZVE32F-NEXT:  .LBB30_14: # %cond.store7
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e8, mf4, ta, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 1, e8, mf2, ta, ma
 ; RV64ZVE32F-NEXT:    vmv.x.s a2, v11
 ; RV64ZVE32F-NEXT:    slli a2, a2, 2
 ; RV64ZVE32F-NEXT:    add a2, a0, a2
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e32, m2, ta, ma
+; RV64ZVE32F-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; RV64ZVE32F-NEXT:    vslidedown.vi v12, v8, 4
 ; RV64ZVE32F-NEXT:    vse32.v v12, (a2)
 ; RV64ZVE32F-NEXT:    andi a2, a1, 32
@@ -2314,12 +2314,12 @@ define void @mscatter_baseidx_zext_v8i8_v8i32(<8 x i32> %val, ptr %base, <8 x i8
 ; RV64ZVE32F-NEXT:    andi a2, a1, 16
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB31_7
 ; RV64ZVE32F-NEXT:  .LBB31_14: # %cond.store7
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e8, mf4, ta, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 1, e8, mf2, ta, ma
 ; RV64ZVE32F-NEXT:    vmv.x.s a2, v11
 ; RV64ZVE32F-NEXT:    andi a2, a2, 255
 ; RV64ZVE32F-NEXT:    slli a2, a2, 2
 ; RV64ZVE32F-NEXT:    add a2, a0, a2
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e32, m2, ta, ma
+; RV64ZVE32F-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; RV64ZVE32F-NEXT:    vslidedown.vi v12, v8, 4
 ; RV64ZVE32F-NEXT:    vse32.v v12, (a2)
 ; RV64ZVE32F-NEXT:    andi a2, a1, 32
@@ -2451,11 +2451,11 @@ define void @mscatter_baseidx_v8i16_v8i32(<8 x i32> %val, ptr %base, <8 x i16> %
 ; RV64ZVE32F-NEXT:    andi a2, a1, 16
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB32_7
 ; RV64ZVE32F-NEXT:  .LBB32_14: # %cond.store7
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e16, mf2, ta, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 1, e16, m1, ta, ma
 ; RV64ZVE32F-NEXT:    vmv.x.s a2, v11
 ; RV64ZVE32F-NEXT:    slli a2, a2, 2
 ; RV64ZVE32F-NEXT:    add a2, a0, a2
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e32, m2, ta, ma
+; RV64ZVE32F-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; RV64ZVE32F-NEXT:    vslidedown.vi v12, v8, 4
 ; RV64ZVE32F-NEXT:    vse32.v v12, (a2)
 ; RV64ZVE32F-NEXT:    andi a2, a1, 32
@@ -2584,11 +2584,11 @@ define void @mscatter_baseidx_sext_v8i16_v8i32(<8 x i32> %val, ptr %base, <8 x i
 ; RV64ZVE32F-NEXT:    andi a2, a1, 16
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB33_7
 ; RV64ZVE32F-NEXT:  .LBB33_14: # %cond.store7
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e16, mf2, ta, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 1, e16, m1, ta, ma
 ; RV64ZVE32F-NEXT:    vmv.x.s a2, v11
 ; RV64ZVE32F-NEXT:    slli a2, a2, 2
 ; RV64ZVE32F-NEXT:    add a2, a0, a2
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e32, m2, ta, ma
+; RV64ZVE32F-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; RV64ZVE32F-NEXT:    vslidedown.vi v12, v8, 4
 ; RV64ZVE32F-NEXT:    vse32.v v12, (a2)
 ; RV64ZVE32F-NEXT:    andi a2, a1, 32
@@ -2724,12 +2724,12 @@ define void @mscatter_baseidx_zext_v8i16_v8i32(<8 x i32> %val, ptr %base, <8 x i
 ; RV64ZVE32F-NEXT:    andi a3, a2, 16
 ; RV64ZVE32F-NEXT:    beqz a3, .LBB34_7
 ; RV64ZVE32F-NEXT:  .LBB34_14: # %cond.store7
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e16, mf2, ta, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 1, e16, m1, ta, ma
 ; RV64ZVE32F-NEXT:    vmv.x.s a3, v11
 ; RV64ZVE32F-NEXT:    and a3, a3, a1
 ; RV64ZVE32F-NEXT:    slli a3, a3, 2
 ; RV64ZVE32F-NEXT:    add a3, a0, a3
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e32, m2, ta, ma
+; RV64ZVE32F-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; RV64ZVE32F-NEXT:    vslidedown.vi v12, v8, 4
 ; RV64ZVE32F-NEXT:    vse32.v v12, (a3)
 ; RV64ZVE32F-NEXT:    andi a3, a2, 32
@@ -6393,11 +6393,11 @@ define void @mscatter_baseidx_v8i8_v8f16(<8 x half> %val, ptr %base, <8 x i8> %i
 ; RV64ZVE32F-NEXT:    andi a2, a1, 16
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB58_7
 ; RV64ZVE32F-NEXT:  .LBB58_14: # %cond.store7
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e8, mf4, ta, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 1, e8, mf2, ta, ma
 ; RV64ZVE32F-NEXT:    vmv.x.s a2, v10
 ; RV64ZVE32F-NEXT:    slli a2, a2, 1
 ; RV64ZVE32F-NEXT:    add a2, a0, a2
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e16, m1, ta, ma
+; RV64ZVE32F-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; RV64ZVE32F-NEXT:    vslidedown.vi v9, v8, 4
 ; RV64ZVE32F-NEXT:    vse16.v v9, (a2)
 ; RV64ZVE32F-NEXT:    andi a2, a1, 32
@@ -6526,11 +6526,11 @@ define void @mscatter_baseidx_sext_v8i8_v8f16(<8 x half> %val, ptr %base, <8 x i
 ; RV64ZVE32F-NEXT:    andi a2, a1, 16
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB59_7
 ; RV64ZVE32F-NEXT:  .LBB59_14: # %cond.store7
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e8, mf4, ta, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 1, e8, mf2, ta, ma
 ; RV64ZVE32F-NEXT:    vmv.x.s a2, v10
 ; RV64ZVE32F-NEXT:    slli a2, a2, 1
 ; RV64ZVE32F-NEXT:    add a2, a0, a2
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e16, m1, ta, ma
+; RV64ZVE32F-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; RV64ZVE32F-NEXT:    vslidedown.vi v9, v8, 4
 ; RV64ZVE32F-NEXT:    vse16.v v9, (a2)
 ; RV64ZVE32F-NEXT:    andi a2, a1, 32
@@ -6663,12 +6663,12 @@ define void @mscatter_baseidx_zext_v8i8_v8f16(<8 x half> %val, ptr %base, <8 x i
 ; RV64ZVE32F-NEXT:    andi a2, a1, 16
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB60_7
 ; RV64ZVE32F-NEXT:  .LBB60_14: # %cond.store7
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e8, mf4, ta, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 1, e8, mf2, ta, ma
 ; RV64ZVE32F-NEXT:    vmv.x.s a2, v10
 ; RV64ZVE32F-NEXT:    andi a2, a2, 255
 ; RV64ZVE32F-NEXT:    slli a2, a2, 1
 ; RV64ZVE32F-NEXT:    add a2, a0, a2
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e16, m1, ta, ma
+; RV64ZVE32F-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; RV64ZVE32F-NEXT:    vslidedown.vi v9, v8, 4
 ; RV64ZVE32F-NEXT:    vse16.v v9, (a2)
 ; RV64ZVE32F-NEXT:    andi a2, a1, 32
@@ -7249,11 +7249,11 @@ define void @mscatter_baseidx_v8i8_v8f32(<8 x float> %val, ptr %base, <8 x i8> %
 ; RV64ZVE32F-NEXT:    andi a2, a1, 16
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB68_7
 ; RV64ZVE32F-NEXT:  .LBB68_14: # %cond.store7
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e8, mf4, ta, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 1, e8, mf2, ta, ma
 ; RV64ZVE32F-NEXT:    vmv.x.s a2, v11
 ; RV64ZVE32F-NEXT:    slli a2, a2, 2
 ; RV64ZVE32F-NEXT:    add a2, a0, a2
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e32, m2, ta, ma
+; RV64ZVE32F-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; RV64ZVE32F-NEXT:    vslidedown.vi v12, v8, 4
 ; RV64ZVE32F-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
 ; RV64ZVE32F-NEXT:    vse32.v v12, (a2)
@@ -7385,11 +7385,11 @@ define void @mscatter_baseidx_sext_v8i8_v8f32(<8 x float> %val, ptr %base, <8 x
 ; RV64ZVE32F-NEXT:    andi a2, a1, 16
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB69_7
 ; RV64ZVE32F-NEXT:  .LBB69_14: # %cond.store7
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e8, mf4, ta, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 1, e8, mf2, ta, ma
 ; RV64ZVE32F-NEXT:    vmv.x.s a2, v11
 ; RV64ZVE32F-NEXT:    slli a2, a2, 2
 ; RV64ZVE32F-NEXT:    add a2, a0, a2
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e32, m2, ta, ma
+; RV64ZVE32F-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; RV64ZVE32F-NEXT:    vslidedown.vi v12, v8, 4
 ; RV64ZVE32F-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
 ; RV64ZVE32F-NEXT:    vse32.v v12, (a2)
@@ -7528,12 +7528,12 @@ define void @mscatter_baseidx_zext_v8i8_v8f32(<8 x float> %val, ptr %base, <8 x
 ; RV64ZVE32F-NEXT:    andi a2, a1, 16
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB70_7
 ; RV64ZVE32F-NEXT:  .LBB70_14: # %cond.store7
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e8, mf4, ta, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 1, e8, mf2, ta, ma
 ; RV64ZVE32F-NEXT:    vmv.x.s a2, v11
 ; RV64ZVE32F-NEXT:    andi a2, a2, 255
 ; RV64ZVE32F-NEXT:    slli a2, a2, 2
 ; RV64ZVE32F-NEXT:    add a2, a0, a2
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e32, m2, ta, ma
+; RV64ZVE32F-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; RV64ZVE32F-NEXT:    vslidedown.vi v12, v8, 4
 ; RV64ZVE32F-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
 ; RV64ZVE32F-NEXT:    vse32.v v12, (a2)
@@ -7669,11 +7669,11 @@ define void @mscatter_baseidx_v8i16_v8f32(<8 x float> %val, ptr %base, <8 x i16>
 ; RV64ZVE32F-NEXT:    andi a2, a1, 16
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB71_7
 ; RV64ZVE32F-NEXT:  .LBB71_14: # %cond.store7
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e16, mf2, ta, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 1, e16, m1, ta, ma
 ; RV64ZVE32F-NEXT:    vmv.x.s a2, v11
 ; RV64ZVE32F-NEXT:    slli a2, a2, 2
 ; RV64ZVE32F-NEXT:    add a2, a0, a2
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e32, m2, ta, ma
+; RV64ZVE32F-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; RV64ZVE32F-NEXT:    vslidedown.vi v12, v8, 4
 ; RV64ZVE32F-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
 ; RV64ZVE32F-NEXT:    vse32.v v12, (a2)
@@ -7806,11 +7806,11 @@ define void @mscatter_baseidx_sext_v8i16_v8f32(<8 x float> %val, ptr %base, <8 x
 ; RV64ZVE32F-NEXT:    andi a2, a1, 16
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB72_7
 ; RV64ZVE32F-NEXT:  .LBB72_14: # %cond.store7
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e16, mf2, ta, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 1, e16, m1, ta, ma
 ; RV64ZVE32F-NEXT:    vmv.x.s a2, v11
 ; RV64ZVE32F-NEXT:    slli a2, a2, 2
 ; RV64ZVE32F-NEXT:    add a2, a0, a2
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e32, m2, ta, ma
+; RV64ZVE32F-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; RV64ZVE32F-NEXT:    vslidedown.vi v12, v8, 4
 ; RV64ZVE32F-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
 ; RV64ZVE32F-NEXT:    vse32.v v12, (a2)
@@ -7950,12 +7950,12 @@ define void @mscatter_baseidx_zext_v8i16_v8f32(<8 x float> %val, ptr %base, <8 x
 ; RV64ZVE32F-NEXT:    andi a3, a2, 16
 ; RV64ZVE32F-NEXT:    beqz a3, .LBB73_7
 ; RV64ZVE32F-NEXT:  .LBB73_14: # %cond.store7
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e16, mf2, ta, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 1, e16, m1, ta, ma
 ; RV64ZVE32F-NEXT:    vmv.x.s a3, v11
 ; RV64ZVE32F-NEXT:    and a3, a3, a1
 ; RV64ZVE32F-NEXT:    slli a3, a3, 2
 ; RV64ZVE32F-NEXT:    add a3, a0, a3
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e32, m2, ta, ma
+; RV64ZVE32F-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; RV64ZVE32F-NEXT:    vslidedown.vi v12, v8, 4
 ; RV64ZVE32F-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
 ; RV64ZVE32F-NEXT:    vse32.v v12, (a3)
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-fp.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-fp.ll
index 4766b3727a46252..514470227af7835 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-fp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-fp.ll
@@ -478,9 +478,9 @@ define float @vreduce_fwadd_v8f32(ptr %x, float %s) {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
 ; CHECK-NEXT:    vle16.v v8, (a0)
-; CHECK-NEXT:    vsetivli zero, 8, e32, m1, ta, ma
+; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
 ; CHECK-NEXT:    vfmv.s.f v9, fa0
-; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
+; CHECK-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; CHECK-NEXT:    vfwredusum.vs v8, v8, v9
 ; CHECK-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
 ; CHECK-NEXT:    vfmv.f.s fa0, v8
@@ -496,9 +496,9 @@ define float @vreduce_ord_fwadd_v8f32(ptr %x, float %s) {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
 ; CHECK-NEXT:    vle16.v v8, (a0)
-; CHECK-NEXT:    vsetivli zero, 8, e32, m1, ta, ma
+; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
 ; CHECK-NEXT:    vfmv.s.f v9, fa0
-; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
+; CHECK-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; CHECK-NEXT:    vfwredosum.vs v8, v8, v9
 ; CHECK-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
 ; CHECK-NEXT:    vfmv.f.s fa0, v8
@@ -544,9 +544,9 @@ define float @vreduce_fwadd_v16f32(ptr %x, float %s) {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 16, e16, m2, ta, ma
 ; CHECK-NEXT:    vle16.v v8, (a0)
-; CHECK-NEXT:    vsetivli zero, 16, e32, m1, ta, ma
+; CHECK-NEXT:    vsetivli zero, 16, e32, m4, ta, ma
 ; CHECK-NEXT:    vfmv.s.f v10, fa0
-; CHECK-NEXT:    vsetivli zero, 16, e16, m2, ta, ma
+; CHECK-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
 ; CHECK-NEXT:    vfwredusum.vs v8, v8, v10
 ; CHECK-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
 ; CHECK-NEXT:    vfmv.f.s fa0, v8
@@ -562,9 +562,9 @@ define float @vreduce_ord_fwadd_v16f32(ptr %x, float %s) {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 16, e16, m2, ta, ma
 ; CHEC...
[truncated]

lukel97 · 2023-10-17T03:36:22Z

This remediates the VL toggles that would be introduced if we decided to remove the LMUL variants for VMV_S_X pseudos as mentioned here: #66659 (comment)

A common pattern with vmv.s.x is that we need to set VL afterwards because the SEW/LMUL ratio has changed, and thus VLMAX has changed: vsetvli zero, a1, e64, m1, ta, ma vmv.s.x v16, a0 vsetvli zero, a1, e32, m2, ta, ma However since LMUL and the SEW/LMUL ratio are ignored by vmv.s.x, we can avoid a VL toggle in the second vsetvli instruction by adjusting LMUL so that the SEW/LMUL ratio remains the same between the two instructions: vsetvli zero, a1, e64, m4, ta, ma vmv.s.x v16, a0 vsetvli zero, zero, e32, m2, ta, ma Avoiding a VL toggle may be more performant on some architectures, and in some cases allows a vsetvli to be deleted.

lukel97 · 2023-10-18T14:30:40Z

Marking as a draft as there may be a more general way of doing this by adjusting the the SEW/LMUL ratio and converting to vsetvli zero,zero separately

preames · 2023-10-20T18:02:42Z

llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp

+      Fractional ? 8 / NewRatio : NewRatio / 8, Fractional);
+
+  unsigned VType = MIInfo.encodeVTYPE();
+  return RISCVVType::encodeVTYPE(NewVLMul, SEW / 8,


SEW/8 appears to just be MIInfo.getSEW()

preames · 2023-10-20T18:03:26Z

llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp

+//
+// If possible, returns the new VTYPE that should be used for MI.
+static std::optional<unsigned>
+canAdjustSEWLMULRatio(const MachineInstr &MI, const MachineInstr &NextMI,


Function turns the VTYPE, not a boolean. So a better name might be: getAdjustedVTYPE

preames · 2023-10-20T18:09:47Z

llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp

+  if (isVLPreservingConfig(MI) || isVLPreservingConfig(NextMI))
+    return std::nullopt;
+
+  VSETVLIInfo NextMIInfo = getInfoForVSETVLI(NextMI);


I don't remember if it's safe to call getInfoForVSETVLI here on the post-rewrite MI. However, the existence of isNonZeroAVL seems to indicate I thought it wasn't previously?

We should either write this in terms of the raw MI, or insert assertions sufficient to prove this is safe during insertion, and rewrite the existing post pass logic in terms of the VSETVLIInfo object.

Yeah, in my local branch I've rewritten this to just operate on the MI, it's not too bad and we don't need getInfoForVSETVLI here.

preames · 2023-10-20T18:11:07Z

llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp

@@ -1484,6 +1535,15 @@ void RISCVInsertVSETVLI::doLocalPostpass(MachineBasicBlock &MBB) {
      Used.demandVL();

    if (NextMI) {
+      if (auto NewVType = canAdjustSEWLMULRatio(MI, *NextMI, Used)) {
+        MI.getOperand(2).setImm(*NewVType);


Have we properly guarded for the save/restore form which has a vreg as operand 2?

preames · 2023-10-20T18:13:29Z

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert.ll

@@ -65,9 +65,8 @@ define <32 x i32> @insertelt_v32i32_31(<32 x i32> %a, i32 %y) {
 ; CHECK-LABEL: insertelt_v32i32_31:
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    li a1, 32
-; CHECK-NEXT:    vsetvli zero, a1, e32, m1, ta, ma
-; CHECK-NEXT:    vmv.s.x v16, a0
 ; CHECK-NEXT:    vsetvli zero, a1, e32, m8, ta, ma


This is an interesting sub-case as we shouldn't need to be reasoning about ratios here. Why didn't the existing logic get this case given only LMUL is changing? I think this might just be the lack of LI handling in isNonZeroAVL?

Good point, I'm not quite sure why this part wasn't already caught since it should just be an entire VTYPE substitution.
Note to self though, this is one of the cases the forward-pass-ratio-mutation in #69788 won't catch because it's the first instruction in the BB

preames · 2023-10-20T18:15:55Z

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-scatter.ll

 ; RV64ZVE32F-NEXT:    vmv.x.s a2, v10
 ; RV64ZVE32F-NEXT:    slli a2, a2, 1
 ; RV64ZVE32F-NEXT:    add a2, a0, a2
-; RV64ZVE32F-NEXT:    vsetivli zero, 1, e16, m1, ta, ma


Why isn't this already a VTYPE toggle from the forward pass? Having the same literal VL and knowing that VL is less than VLMAX for both should be sufficient?

preames · 2023-10-20T18:16:55Z

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-fp.ll

 ; CHECK-NEXT:    vfmv.s.f v9, fa0
-; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma


Actually, thinking about this, why can't the forward pass find this case too? Same reasoning as the previous comment.

preames · 2023-10-20T19:38:43Z

The forward version of (the VTYPE toggle subset of) this patch turned out to be really straight forward. I posted it here: #69759

I suggest we review and land that one, then rebase this to see what is left. Depending on that, we may or may not decide to move forward with this. Let's see if the complexity is still justified.

lukel97 · 2023-10-20T21:33:10Z

The forward version of (the VTYPE toggle subset of) this patch turned out to be really straight forward. I posted it here: #69759

I suggest we review and land that one, then rebase this to see what is left. Depending on that, we may or may not decide to move forward with this. Let's see if the complexity is still justified.

I agree with trying to move this into the forward phase, the "generalisation" I was alluding to in the previous comment was able to move this into RISCVInsertVSETVLI::transferBefore which didn't require any MachineInstr manipulation

) For instructions like vmv.s.x and friends where we don't care about LMUL or the SEW/LMUL ratio, we can change the LMUL in its state so that it has the same SEW/LMUL ratio as the previous state. This allows us to avoid more VL toggles later down the line (i.e. use vsetvli zero, zero, which requires that the SEW/LMUL ratio must be the same) This is an alternative approach to the idea in #69259, but note that they don't catch exactly the same test cases.

lukel97 · 2023-11-08T05:02:50Z

Closing for now since #69788 seems to cover ~~all~~most of the cases needed to remove the LMUL variants in the pseudos in #71501

lukel97 requested review from preames, kito-cheng, michaelmaitland and topperc October 16, 2023 22:38

llvmbot added the backend:RISC-V label Oct 16, 2023

lukel97 added 3 commits October 17, 2023 17:11

Check that new ratio is valid

c131942

Update test outside of rvv tree

0ddba02

lukel97 force-pushed the adjust-sewlmul-ratio branch from d52a34c to 0ddba02 Compare October 17, 2023 21:32

lukel97 marked this pull request as draft October 18, 2023 14:29

preames reviewed Oct 20, 2023

View reviewed changes

preames mentioned this pull request Oct 20, 2023

[RISCV][VSETVLI] Prefer VTYPE for immediate known to be less than VLMAX #69759

Closed

lukel97 mentioned this pull request Oct 20, 2023

[RISCV] Keep same SEW/LMUL ratio if possible in forward transfer #69788

Merged

lukel97 closed this Nov 8, 2023

lukel97 mentioned this pull request Nov 8, 2023

[RISCV] Remove vmv.s.x and vmv.x.s lmul pseudo variants #71501

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV] Adjust LMUL if not used to avoid VL toggle #69259

[RISCV] Adjust LMUL if not used to avoid VL toggle #69259

lukel97 commented Oct 16, 2023

llvmbot commented Oct 16, 2023

lukel97 commented Oct 17, 2023

lukel97 commented Oct 18, 2023

preames Oct 20, 2023

preames Oct 20, 2023

preames Oct 20, 2023

lukel97 Oct 20, 2023 •

edited

Loading

preames Oct 20, 2023

preames Oct 20, 2023

lukel97 Oct 20, 2023

preames Oct 20, 2023

preames Oct 20, 2023

preames commented Oct 20, 2023 •

edited

Loading

lukel97 commented Oct 20, 2023

lukel97 commented Nov 8, 2023 •

edited

Loading

		; CHECK-NEXT: vfmv.s.f v9, fa0
		; CHECK-NEXT: vsetivli zero, 8, e16, m1, ta, ma

[RISCV] Adjust LMUL if not used to avoid VL toggle #69259

[RISCV] Adjust LMUL if not used to avoid VL toggle #69259

Conversation

lukel97 commented Oct 16, 2023

llvmbot commented Oct 16, 2023

lukel97 commented Oct 17, 2023

lukel97 commented Oct 18, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lukel97 Oct 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

preames commented Oct 20, 2023 • edited Loading

lukel97 commented Oct 20, 2023

lukel97 commented Nov 8, 2023 • edited Loading

lukel97 Oct 20, 2023 •

edited

Loading

preames commented Oct 20, 2023 •

edited

Loading

lukel97 commented Nov 8, 2023 •

edited

Loading