-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU] Fix folding clamp into pseudo scalar instructions #100568
Conversation
@llvm/pr-subscribers-backend-amdgpu Author: Mirko Brkušanin (mbrkusanin) ChangesClamp is canonically a v_max* instruction with a VGPR dst. Folding clamp Full diff: https://github.com/llvm/llvm-project/pull/100568.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index 9b2cab2eb73a3..f11c4dc46e27c 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -1581,7 +1581,18 @@ bool SIFoldOperands::tryFoldClamp(MachineInstr &MI) {
// Clamp is applied after omod, so it is OK if omod is set.
DefClamp->setImm(1);
- MRI->replaceRegWith(MI.getOperand(0).getReg(), Def->getOperand(0).getReg());
+
+ Register DefReg = Def->getOperand(0).getReg();
+ Register MIDstReg = MI.getOperand(0).getReg();
+ if (TRI->isSGPRReg(*MRI, DefReg)) {
+ // Psuedo scalar instructions have a SGPR for dst and clamp is a v_max*
+ // instruction with a VGPR dst.
+ BuildMI(*MI.getParent(), MI, MI.getDebugLoc(), TII->get(AMDGPU::COPY),
+ MIDstReg)
+ .addReg(DefReg);
+ } else {
+ MRI->replaceRegWith(MIDstReg, DefReg);
+ }
MI.eraseFromParent();
// Use of output modifiers forces VOP3 encoding for a VOP2 mac/fmac
diff --git a/llvm/test/CodeGen/AMDGPU/si-fold-scalar-clamp.mir b/llvm/test/CodeGen/AMDGPU/si-fold-scalar-clamp.mir
new file mode 100644
index 0000000000000..928479534dcdd
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/si-fold-scalar-clamp.mir
@@ -0,0 +1,24 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1200 -run-pass=si-fold-operands -verify-machineinstrs -o - %s | FileCheck %s
+---
+name: test
+tracksRegLiveness: true
+body: |
+ bb.0:
+ liveins: $sgpr0
+
+ ; CHECK-LABEL: name: test
+ ; CHECK: liveins: $sgpr0
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:sgpr_32 = COPY $sgpr0
+ ; CHECK-NEXT: [[V_S_RSQ_F32_e64_:%[0-9]+]]:sgpr_32 = nofpexcept V_S_RSQ_F32_e64 0, [[COPY]], 1, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY [[V_S_RSQ_F32_e64_]]
+ ; CHECK-NEXT: EXP_DONE 0, killed [[COPY1]], [[COPY1]], [[COPY1]], [[COPY1]], -1, 0, 15, implicit $exec
+ ; CHECK-NEXT: S_ENDPGM 0
+ %0:sgpr_32 = COPY $sgpr0
+ %1:sgpr_32 = nofpexcept V_S_RSQ_F32_e64 0, %0, 0, 0, implicit $mode, implicit $exec
+ %2:vgpr_32 = nofpexcept V_MAX_F32_e64 0, %1, 0, %1, -1, 0, implicit $mode, implicit $exec
+ EXP_DONE 0, killed %2, %2, %2, %2, -1, 0, 15, implicit $exec
+ S_ENDPGM 0
+
+...
|
; CHECK-NEXT: EXP_DONE 0, killed [[COPY1]], [[COPY1]], [[COPY1]], [[COPY1]], -1, 0, 15, implicit $exec | ||
; CHECK-NEXT: S_ENDPGM 0 | ||
%0:sgpr_32 = COPY $sgpr0 | ||
%1:sgpr_32 = nofpexcept V_S_RSQ_F32_e64 0, %0, 0, 0, implicit $mode, implicit $exec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this have a V_ prefix if it's just an S_ instruction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is one of new pseudo scalar instructions. It behaves like SOP1 but it is actually a VOP3 and runs on VALU.
c1178c9
to
42873a6
Compare
%0:sgpr_32 = COPY $sgpr0 | ||
%1:sgpr_32 = nofpexcept V_S_RSQ_F32_e64 0, %0, 0, 0, implicit $mode, implicit $exec | ||
%2:vgpr_32 = nofpexcept V_MAX_F32_e64 0, %1, 0, %1, -1, 0, implicit $mode, implicit $exec | ||
EXP_DONE 0, killed %2, %2, %2, %2, -1, 0, 15, implicit $exec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use a simpler use, just copy to $vgpr0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes for representing the change in patch, but that use case was not previously crashing with -verify-machineinstrs. I wanted something that must use a vgpr. Should I change it anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any VALU instruction will do in a VGPR only operand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with optional test nit
…tructions Clamp is canonicaly a v_max* instruction with a VGPR dst. Folding clamp into a pseudo scalar instruction can cause issue due to a change in regbank. We fix this with a copy.
42873a6
to
e206cce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/123/builds/2595 Here is the relevant piece of the build log for the reference:
|
Summary: Clamp is canonically a v_max* instruction with a VGPR dst. Folding clamp into a pseudo scalar instruction can cause issues due to a change in regbank. We fix this with a copy. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250582
@mbrkusanin please get this backported to 19.x. |
Clamp is canonically a v_max* instruction with a VGPR dst. Folding clamp into a pseudo scalar instruction can cause issues due to a change in regbank. We fix this with a copy. (cherry picked from commit 817cd72)
Backport PR: #102446 |
Clamp is canonically a v_max* instruction with a VGPR dst. Folding clamp into a pseudo scalar instruction can cause issues due to a change in regbank. We fix this with a copy. (cherry picked from commit 817cd72)
Clamp is canonically a v_max* instruction with a VGPR dst. Folding clamp into a pseudo scalar instruction can cause issues due to a change in regbank. We fix this with a copy. (cherry picked from commit 817cd72)
Clamp is canonically a v_max* instruction with a VGPR dst. Folding clamp
into a pseudo scalar instruction can cause issues due to a change in
regbank. We fix this with a copy.