-
Notifications
You must be signed in to change notification settings - Fork 11.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU] Fix mul combine for MUL24 #79110
Conversation
MUL24 can now return a i64 for i32 operands, but the combine was never updated to handle this case. Extend the operand when rewriting the ADD to handle it.
@llvm/pr-subscribers-backend-amdgpu Author: Pierre van Houtryve (Pierre-vh) ChangesMUL24 can now return a i64 for i32 operands, but the combine was never updated to handle this case. Extend the operand when rewriting the ADD to handle it. Fixes SWDEV-436654 Full diff: https://github.com/llvm/llvm-project/pull/79110.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
index 55d95154c75878b..109e86eb4117a2f 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
@@ -4246,12 +4246,12 @@ SDValue AMDGPUTargetLowering::performMulCombine(SDNode *N,
// operands, so we have to place the mul in the LHS
if (SDValue MulOper = IsFoldableAdd(N0)) {
SDValue MulVal = DAG.getNode(N->getOpcode(), DL, VT, N1, MulOper);
- return DAG.getNode(ISD::ADD, DL, VT, MulVal, N1);
+ return DAG.getNode(ISD::ADD, DL, VT, MulVal, DAG.getZExtOrTrunc(N1, DL, VT));
}
if (SDValue MulOper = IsFoldableAdd(N1)) {
SDValue MulVal = DAG.getNode(N->getOpcode(), DL, VT, N0, MulOper);
- return DAG.getNode(ISD::ADD, DL, VT, MulVal, N0);
+ return DAG.getNode(ISD::ADD, DL, VT, MulVal, DAG.getZExtOrTrunc(N0, DL, VT));
}
// Skip if already mul24.
diff --git a/llvm/test/CodeGen/AMDGPU/mul-combine-crash.ll b/llvm/test/CodeGen/AMDGPU/mul-combine-crash.ll
new file mode 100644
index 000000000000000..624c8aa859c0939
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/mul-combine-crash.ll
@@ -0,0 +1,47 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s | FileCheck %s
+
+; Checks that the DAG mul combine can handle a MUL24 with a i32 and i64
+; operand.
+
+define i64 @test(i64 %x, i32 %z) {
+; CHECK-LABEL: test:
+; CHECK: ; %bb.0: ; %entry
+; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT: v_and_b32_e32 v0, 0xff, v0
+; CHECK-NEXT: v_and_b32_e32 v2, 1, v2
+; CHECK-NEXT: v_mov_b32_e32 v1, 0
+; CHECK-NEXT: v_mad_u64_u32 v[0:1], s[4:5], v0, v2, v[0:1]
+; CHECK-NEXT: v_mov_b32_e32 v1, 0
+; CHECK-NEXT: s_setpc_b64 s[30:31]
+entry:
+ %a = add i64 %x, 0
+ %b = and i64 %a, 255
+ %c = and i32 %z, 1
+ %d = add nuw nsw i32 %c, 1
+ %e = zext nneg i32 %d to i64
+ %f = mul nuw nsw i64 %b, %e
+ %g = add nuw nsw i64 %f, 0
+ ret i64 %g
+}
+
+define i64 @test_swapped(i64 %x, i32 %z) {
+; CHECK-LABEL: test_swapped:
+; CHECK: ; %bb.0: ; %entry
+; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT: v_and_b32_e32 v0, 0xff, v0
+; CHECK-NEXT: v_and_b32_e32 v2, 1, v2
+; CHECK-NEXT: v_mov_b32_e32 v1, 0
+; CHECK-NEXT: v_mad_u64_u32 v[0:1], s[4:5], v0, v2, v[0:1]
+; CHECK-NEXT: v_mov_b32_e32 v1, 0
+; CHECK-NEXT: s_setpc_b64 s[30:31]
+entry:
+ %a = add i64 %x, 0
+ %b = and i64 %a, 255
+ %c = and i32 %z, 1
+ %d = add nuw nsw i32 %c, 1
+ %e = zext nneg i32 %d to i64
+ %f = mul nuw nsw i64 %e, %b
+ %g = add nuw nsw i64 %f, 0
+ ret i64 %g
+}
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"mul24 x, (add y, 1) -> add (mul24 x, y), x" does not seem safe at all. You know that y+1 is a 24-bit value, but that does not guarantee that y is a 24-bit value.
Ah indeed, good catch. |
I guess it's safe for a regular mul, so maybe move the "Skip if already mul24" up a bit? |
This probably needs work to extend for the mul24 cases, with some known bits checks. Probably should just skip for it |
@@ -4254,10 +4254,6 @@ SDValue AMDGPUTargetLowering::performMulCombine(SDNode *N, | |||
return DAG.getNode(ISD::ADD, DL, VT, MulVal, N0); | |||
} | |||
|
|||
// Skip if already mul24. | |||
if (N->getOpcode() != ISD::MUL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should turn this into an assert? Is there definitely nothing else in performMulCombine that can operate on mul24?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is anything else it can do at first glance
%d = add nuw nsw i32 %c, 1 | ||
%e = zext nneg i32 %d to i64 | ||
%f = mul nuw nsw i64 %e, %b | ||
%g = add nuw nsw i64 %f, 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the adds of 0 really necessary? Can the test merge in with the existing test?
MUL24 can now return a i64 for i32 operands, but the combine was never updated to handle this case. Extend the operand when rewriting the ADD to handle it. Fixes SWDEV-436654 Change-Id: I0d0bf5b5f45b901f2ec60bdefb9597233d1ce482
MUL24 can now return a i64 for i32 operands, but the combine was never updated to handle this case. Extend the operand when rewriting the ADD to handle it.
Fixes SWDEV-436654