-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV] Fix vmerge.vvm/vmv.v.v getting folded into ops with mismatching EEW #101152
[RISCV] Fix vmerge.vvm/vmv.v.v getting folded into ops with mismatching EEW #101152
Conversation
…ng EEW As noted in https://github.com/llvm/llvm-project/pull/100367/files#r1695448771, we currently fold in vmerge.vvms and vmv.v.vs into their ops even if the EEW is different. This is incorrect if we end up changing the mask or AVL of the op. This gets the op's EEW via its simple value type for now since there doesn't seem to be any existing information about the EEW size of instructions. We'll probably need to encode this at some point if we want to be able to access it at the MachineInstr level in llvm#100367
c48a5c8
to
0190737
Compare
// the VL or mask. | ||
if (Log2_64(True.getScalarValueSizeInBits()) != | ||
N->getConstantOperandVal( | ||
RISCVII::getSEWOpNum(TII->get(N->getMachineOpcode())) - 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The opcode is known? So I think we don't need to get SEW operand index from TSFlag, we can hardcode it here.
if (mayRaiseFPException(True.getNode()) && | ||
!True->getFlags().hasNoFPExcept()) | ||
return false; | ||
|
||
// If the EEW of True is different from vmerge's SEW, then we cannot change | ||
// the VL or mask. | ||
if (Log2_64(True.getScalarValueSizeInBits()) != |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe don't use log here, we do a shift left to log2sew
instead. Shift is cheaper than log I think.
@@ -3855,11 +3855,19 @@ bool RISCVDAGToDAGISel::performCombineVMergeAndVOps(SDNode *N) { | |||
// If we end up changing the VL or mask of True, then we need to make sure it | |||
// doesn't raise any observable fp exceptions, since changing the active | |||
// elements will affect how fflags is set. | |||
if (TrueVL != VL || !IsMasked) | |||
if (TrueVL != VL || !IsMasked) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is specific to the change case. Even if we have exactly equal VL values, those values describe a different number of bits. So folding away the vmerge.vv and vmv.v.v is still illegal.
I think you can also use a much easier check here - the VT of the TrueOp should equal the VT of the vmerge or vmv.v.v. (Really the respective operand, but we don't have widening or narrowing versions of either so that's equivalent.)
@llvm/pr-subscribers-backend-risc-v Author: Luke Lau (lukel97) ChangesAs noted in https://github.com/llvm/llvm-project/pull/100367/files#r1695448771, we currently fold in vmerge.vvms and vmv.v.vs into their ops even if the EEW is different. This is incorrect if we end up changing the mask or AVL of the op. This gets the op's EEW via its simple value type for now since there doesn't seem to be any existing information about the EEW size of instructions. We'll probably need to encode this at some point if we want to be able to access it at the MachineInstr level in #100367 Full diff: https://github.com/llvm/llvm-project/pull/101152.diff 3 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
index 4418905ce21ed..4de38db6e1fe9 100644
--- a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
@@ -3733,6 +3733,10 @@ bool RISCVDAGToDAGISel::performCombineVMergeAndVOps(SDNode *N) {
assert(!Mask || cast<RegisterSDNode>(Mask)->getReg() == RISCV::V0);
assert(!Glue || Glue.getValueType() == MVT::Glue);
+ // If the EEW of True is different from vmerge's SEW, then we can't fold.
+ if (True.getSimpleValueType() != N->getSimpleValueType(0))
+ return false;
+
// We require that either passthru and false are the same, or that passthru
// is undefined.
if (Passthru != False && !isImplicitDef(Passthru))
diff --git a/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll b/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll
index a08bcae074b9b..259515f160048 100644
--- a/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll
@@ -1196,3 +1196,24 @@ define <vscale x 2 x i32> @true_mask_vmerge_implicit_passthru(<vscale x 2 x i32>
)
ret <vscale x 2 x i32> %b
}
+
+
+define <vscale x 2 x i32> @unfoldable_mismatched_sew(<vscale x 2 x i32> %passthru, <vscale x 1 x i64> %x, <vscale x 1 x i64> %y, <vscale x 2 x i1> %mask, i64 %avl) {
+; CHECK-LABEL: unfoldable_mismatched_sew:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetvli zero, a0, e64, m1, ta, ma
+; CHECK-NEXT: vadd.vv v9, v9, v10
+; CHECK-NEXT: vsetvli zero, a0, e32, m1, tu, ma
+; CHECK-NEXT: vmv.v.v v8, v9
+; CHECK-NEXT: ret
+ %a = call <vscale x 1 x i64> @llvm.riscv.vadd.nxv1i64.nxv1i64(<vscale x 1 x i64> poison, <vscale x 1 x i64> %x, <vscale x 1 x i64> %y, i64 %avl)
+ %a.bitcast = bitcast <vscale x 1 x i64> %a to <vscale x 2 x i32>
+ %b = call <vscale x 2 x i32> @llvm.riscv.vmerge.nxv2i32.nxv2i32(
+ <vscale x 2 x i32> %passthru,
+ <vscale x 2 x i32> %passthru,
+ <vscale x 2 x i32> %a.bitcast,
+ <vscale x 2 x i1> splat (i1 true),
+ i64 %avl
+ )
+ ret <vscale x 2 x i32> %b
+}
diff --git a/llvm/test/CodeGen/RISCV/rvv/vmv.v.v-peephole.ll b/llvm/test/CodeGen/RISCV/rvv/vmv.v.v-peephole.ll
index 8a589a31a1535..3952e48c5c28f 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vmv.v.v-peephole.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vmv.v.v-peephole.ll
@@ -180,3 +180,17 @@ define <vscale x 2 x i32> @unfoldable_vredsum(<vscale x 2 x i32> %passthru, <vsc
%b = call <vscale x 2 x i32> @llvm.riscv.vmv.v.v.nxv2i32(<vscale x 2 x i32> %passthru, <vscale x 2 x i32> %a, iXLen 1)
ret <vscale x 2 x i32> %b
}
+
+define <vscale x 2 x i32> @unfoldable_mismatched_sew(<vscale x 2 x i32> %passthru, <vscale x 1 x i64> %x, <vscale x 1 x i64> %y, iXLen %avl) {
+; CHECK-LABEL: unfoldable_mismatched_sew:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetvli zero, a0, e64, m1, ta, ma
+; CHECK-NEXT: vadd.vv v9, v9, v10
+; CHECK-NEXT: vsetvli zero, a0, e32, m1, tu, ma
+; CHECK-NEXT: vmv.v.v v8, v9
+; CHECK-NEXT: ret
+ %a = call <vscale x 1 x i64> @llvm.riscv.vadd.nxv1i64.nxv1i64(<vscale x 1 x i64> poison, <vscale x 1 x i64> %x, <vscale x 1 x i64> %y, iXLen %avl)
+ %a.bitcast = bitcast <vscale x 1 x i64> %a to <vscale x 2 x i32>
+ %b = call <vscale x 2 x i32> @llvm.riscv.vmv.v.v.nxv2i32(<vscale x 2 x i32> %passthru, <vscale x 2 x i32> %a.bitcast, iXLen %avl)
+ ret <vscale x 2 x i32> %b
+}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/cherry-pick d01c051 |
Error: Command failed due to missing milestone. |
/cherry-pick d01c051 |
Failed to cherry-pick: d01c051 https://github.com/llvm/llvm-project/actions/runs/10166340246 Please manually backport the fix and push it to your github fork. Once this is done, please create a pull request |
Since we don't have access to the MVTs, we can also do the same thing by checking the VLMAXs are the same (i.e. the sew/lmul ratio)
…ng EEW (llvm#101152) As noted in https://github.com/llvm/llvm-project/pull/100367/files#r1695448771, we currently fold in vmerge.vvms and vmv.v.vs into their ops even if the EEW is different which leads to an incorrect transform. This checks the op's EEW via its simple value type for now since there doesn't seem to be any existing information about the EEW size of instructions. We'll probably need to encode this at some point if we want to be able to access it at the MachineInstr level in llvm#100367
…ng EEW (llvm#101152) As noted in https://github.com/llvm/llvm-project/pull/100367/files#r1695448771, we currently fold in vmerge.vvms and vmv.v.vs into their ops even if the EEW is different which leads to an incorrect transform. This checks the op's EEW via its simple value type for now since there doesn't seem to be any existing information about the EEW size of instructions. We'll probably need to encode this at some point if we want to be able to access it at the MachineInstr level in llvm#100367
…ng EEW (llvm#101152) As noted in https://github.com/llvm/llvm-project/pull/100367/files#r1695448771, we currently fold in vmerge.vvms and vmv.v.vs into their ops even if the EEW is different which leads to an incorrect transform. This checks the op's EEW via its simple value type for now since there doesn't seem to be any existing information about the EEW size of instructions. We'll probably need to encode this at some point if we want to be able to access it at the MachineInstr level in llvm#100367
Since we don't have access to the MVTs, we can also do the same thing by checking the VLMAXs are the same (i.e. the sew/lmul ratio)
As noted in https://github.com/llvm/llvm-project/pull/100367/files#r1695448771, we currently fold in vmerge.vvms and vmv.v.vs into their ops even if the EEW is different which leads to an incorrect transform.
This checks the op's EEW via its simple value type for now since there doesn't seem to be any existing information about the EEW size of instructions. We'll probably need to encode this at some point if we want to be able to access it at the MachineInstr level in #100367