[AMDGPU] Fix mul combine for MUL24 #79110

Pierre-vh · 2024-01-23T08:42:57Z

MUL24 can now return a i64 for i32 operands, but the combine was never updated to handle this case. Extend the operand when rewriting the ADD to handle it.

Fixes SWDEV-436654

MUL24 can now return a i64 for i32 operands, but the combine was never updated to handle this case. Extend the operand when rewriting the ADD to handle it.

llvmbot · 2024-01-23T08:43:28Z

@llvm/pr-subscribers-backend-amdgpu

Author: Pierre van Houtryve (Pierre-vh)

Changes

MUL24 can now return a i64 for i32 operands, but the combine was never updated to handle this case. Extend the operand when rewriting the ADD to handle it.

Fixes SWDEV-436654

Full diff: https://github.com/llvm/llvm-project/pull/79110.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp (+2-2)
(added) llvm/test/CodeGen/AMDGPU/mul-combine-crash.ll (+47)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
index 55d95154c75878b..109e86eb4117a2f 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
@@ -4246,12 +4246,12 @@ SDValue AMDGPUTargetLowering::performMulCombine(SDNode *N,
   // operands, so we have to place the mul in the LHS
   if (SDValue MulOper = IsFoldableAdd(N0)) {
     SDValue MulVal = DAG.getNode(N->getOpcode(), DL, VT, N1, MulOper);
-    return DAG.getNode(ISD::ADD, DL, VT, MulVal, N1);
+    return DAG.getNode(ISD::ADD, DL, VT, MulVal, DAG.getZExtOrTrunc(N1, DL, VT));
   }
 
   if (SDValue MulOper = IsFoldableAdd(N1)) {
     SDValue MulVal = DAG.getNode(N->getOpcode(), DL, VT, N0, MulOper);
-    return DAG.getNode(ISD::ADD, DL, VT, MulVal, N0);
+    return DAG.getNode(ISD::ADD, DL, VT, MulVal, DAG.getZExtOrTrunc(N0, DL, VT));
   }
 
   // Skip if already mul24.
diff --git a/llvm/test/CodeGen/AMDGPU/mul-combine-crash.ll b/llvm/test/CodeGen/AMDGPU/mul-combine-crash.ll
new file mode 100644
index 000000000000000..624c8aa859c0939
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/mul-combine-crash.ll
@@ -0,0 +1,47 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s | FileCheck %s
+
+; Checks that the DAG mul combine can handle a MUL24 with a i32 and i64
+; operand.
+
+define i64 @test(i64 %x, i32 %z) {
+; CHECK-LABEL: test:
+; CHECK:       ; %bb.0: ; %entry
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    v_and_b32_e32 v0, 0xff, v0
+; CHECK-NEXT:    v_and_b32_e32 v2, 1, v2
+; CHECK-NEXT:    v_mov_b32_e32 v1, 0
+; CHECK-NEXT:    v_mad_u64_u32 v[0:1], s[4:5], v0, v2, v[0:1]
+; CHECK-NEXT:    v_mov_b32_e32 v1, 0
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+entry:
+  %a = add i64 %x, 0
+  %b = and i64 %a, 255
+  %c = and i32 %z, 1
+  %d = add nuw nsw i32 %c, 1
+  %e = zext nneg i32 %d to i64
+  %f = mul nuw nsw i64 %b, %e
+  %g = add nuw nsw i64 %f, 0
+  ret i64 %g
+}
+
+define i64 @test_swapped(i64 %x, i32 %z) {
+; CHECK-LABEL: test_swapped:
+; CHECK:       ; %bb.0: ; %entry
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    v_and_b32_e32 v0, 0xff, v0
+; CHECK-NEXT:    v_and_b32_e32 v2, 1, v2
+; CHECK-NEXT:    v_mov_b32_e32 v1, 0
+; CHECK-NEXT:    v_mad_u64_u32 v[0:1], s[4:5], v0, v2, v[0:1]
+; CHECK-NEXT:    v_mov_b32_e32 v1, 0
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+entry:
+  %a = add i64 %x, 0
+  %b = and i64 %a, 255
+  %c = and i32 %z, 1
+  %d = add nuw nsw i32 %c, 1
+  %e = zext nneg i32 %d to i64
+  %f = mul nuw nsw i64 %e, %b
+  %g = add nuw nsw i64 %f, 0
+  ret i64 %g
+}

github-actions · 2024-01-23T08:45:21Z

✅ With the latest revision this PR passed the C/C++ code formatter.

jayfoad

"mul24 x, (add y, 1) -> add (mul24 x, y), x" does not seem safe at all. You know that y+1 is a 24-bit value, but that does not guarantee that y is a 24-bit value.

Pierre-vh · 2024-01-24T07:00:30Z

"mul24 x, (add y, 1) -> add (mul24 x, y), x" does not seem safe at all. You know that y+1 is a 24-bit value, but that does not guarantee that y is a 24-bit value.

Ah indeed, good catch.
Should the combine just be removed? (cc @arsenm - I think you wrote this one)

jayfoad · 2024-01-24T10:50:31Z

"mul24 x, (add y, 1) -> add (mul24 x, y), x" does not seem safe at all. You know that y+1 is a 24-bit value, but that does not guarantee that y is a 24-bit value.

Ah indeed, good catch. Should the combine just be removed?

I guess it's safe for a regular mul, so maybe move the "Skip if already mul24" up a bit?

arsenm · 2024-01-25T08:39:11Z

This probably needs work to extend for the mul24 cases, with some known bits checks. Probably should just skip for it

arsenm · 2024-01-29T09:04:24Z

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

@@ -4254,10 +4254,6 @@ SDValue AMDGPUTargetLowering::performMulCombine(SDNode *N,
    return DAG.getNode(ISD::ADD, DL, VT, MulVal, N0);
  }

-  // Skip if already mul24.
-  if (N->getOpcode() != ISD::MUL)


Probably should turn this into an assert? Is there definitely nothing else in performMulCombine that can operate on mul24?

I don't think there is anything else it can do at first glance

arsenm · 2024-01-29T13:54:51Z

llvm/test/CodeGen/AMDGPU/mul-combine-crash.ll

+  %d = add nuw nsw i32 %c, 1
+  %e = zext nneg i32 %d to i64
+  %f = mul nuw nsw i64 %e, %b
+  %g = add nuw nsw i64 %f, 0


Are the adds of 0 really necessary? Can the test merge in with the existing test?

MUL24 can now return a i64 for i32 operands, but the combine was never updated to handle this case. Extend the operand when rewriting the ADD to handle it. Fixes SWDEV-436654 Change-Id: I0d0bf5b5f45b901f2ec60bdefb9597233d1ce482

[AMDGPU] Fix mul combine for MUL24

ef01c94

MUL24 can now return a i64 for i32 operands, but the combine was never updated to handle this case. Extend the operand when rewriting the ADD to handle it.

Pierre-vh requested review from jayfoad and arsenm January 23, 2024 08:42

llvmbot added the backend:AMDGPU label Jan 23, 2024

clang-format

d43874c

jayfoad reviewed Jan 23, 2024

View reviewed changes

disable combine for mul24

236b0fd

Pierre-vh requested a review from jayfoad January 29, 2024 06:28

arsenm reviewed Jan 29, 2024

View reviewed changes

add assert

82efb3b

Pierre-vh requested a review from arsenm January 29, 2024 12:28

arsenm approved these changes Jan 29, 2024

View reviewed changes

update tests

e35de2b

Pierre-vh merged commit ce72f78 into llvm:main Jan 29, 2024
3 of 4 checks passed

Pierre-vh deleted the fix-mul-comb branch January 29, 2024 15:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMDGPU] Fix mul combine for MUL24 #79110

[AMDGPU] Fix mul combine for MUL24 #79110

Pierre-vh commented Jan 23, 2024

llvmbot commented Jan 23, 2024

github-actions bot commented Jan 23, 2024 •

edited

Loading

jayfoad left a comment

Pierre-vh commented Jan 24, 2024

jayfoad commented Jan 24, 2024

arsenm commented Jan 25, 2024

arsenm Jan 29, 2024

Pierre-vh Jan 29, 2024

arsenm Jan 29, 2024

[AMDGPU] Fix mul combine for MUL24 #79110

[AMDGPU] Fix mul combine for MUL24 #79110

Conversation

Pierre-vh commented Jan 23, 2024

llvmbot commented Jan 23, 2024

github-actions bot commented Jan 23, 2024 • edited Loading

jayfoad left a comment

Choose a reason for hiding this comment

Pierre-vh commented Jan 24, 2024

jayfoad commented Jan 24, 2024

arsenm commented Jan 25, 2024

arsenm Jan 29, 2024

Choose a reason for hiding this comment

Pierre-vh Jan 29, 2024

Choose a reason for hiding this comment

arsenm Jan 29, 2024

Choose a reason for hiding this comment

github-actions bot commented Jan 23, 2024 •

edited

Loading