-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AArch64][GlobalISel] Improve non-SVE popcount for 32bit and 64 bit using udot #96409
Conversation
@llvm/pr-subscribers-llvm-globalisel @llvm/pr-subscribers-backend-aarch64 Author: Tim Gymnich (tgymnich) ChangesFollow up for #95881 Use udot instead of a sequence of uaddlp instructions when summing up lanes for popcount. TODO:
Full diff: https://github.com/llvm/llvm-project/pull/96409.diff 1 Files Affected:
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index fef0b722efe45..84cdc40bc5a3b 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -14,6 +14,7 @@
#include "AArch64LegalizerInfo.h"
#include "AArch64RegisterBankInfo.h"
#include "AArch64Subtarget.h"
+#include "MCTargetDesc/AArch64MCTargetDesc.h"
#include "llvm/ADT/STLExtras.h"
#include "llvm/CodeGen/GlobalISel/GenericMachineInstrs.h"
#include "llvm/CodeGen/GlobalISel/LegalizerHelper.h"
@@ -1904,6 +1905,31 @@ bool AArch64LegalizerInfo::legalizeCTPOP(MachineInstr &MI,
auto CTPOP = MIRBuilder.buildCTPOP(VTy, Val);
// Sum across lanes.
+
+ if (ST->hasDotProd() && Ty.getNumElements() >= 2 &&
+ Ty.getScalarSizeInBits() != 16) {
+ LLT Dt = Ty == LLT::fixed_vector(2, 64) ? LLT::fixed_vector(4, 32) : Ty;
+ auto Zeros = MIRBuilder.buildConstant(Ty, 0);
+ auto Ones = MIRBuilder.buildConstant(VTy, 1);
+ MachineInstrBuilder SUM;
+
+ if (Ty == LLT::fixed_vector(2, 64)) {
+ auto UDOT =
+ MIRBuilder.buildInstr(AArch64::G_UDOT, {Dt}, {Zeros, Ones, CTPOP});
+ SUM = MIRBuilder.buildInstr(AArch64::G_UADDLP, {Ty}, {UDOT});
+ } else if (Ty == LLT::fixed_vector(4, 32)) {
+ SUM = MIRBuilder.buildInstr(AArch64::G_UDOT, {Dt}, {Zeros, Ones, CTPOP});
+ } else if (Ty == LLT::fixed_vector(2, 32)) {
+ SUM = MIRBuilder.buildInstr(AArch64::G_UDOT, {Dt}, {Zeros, Ones, CTPOP});
+ } else {
+ llvm_unreachable("unexpected vector shape");
+ }
+
+ SUM->getOperand(0).setReg(Dst);
+ MI.eraseFromParent();
+ return true;
+ }
+
Register HSum = CTPOP.getReg(0);
unsigned Opc;
SmallVector<LLT> HAddTys;
|
3dd6016
to
72e8da9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thanks. Can we add global-isel run lines to the existing tests?
72e8da9
to
76de62c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like the tests will need to be rebased once the other patch is committed, but this looks OK to me.
@@ -14,6 +14,7 @@ | |||
#include "AArch64LegalizerInfo.h" | |||
#include "AArch64RegisterBankInfo.h" | |||
#include "AArch64Subtarget.h" | |||
#include "MCTargetDesc/AArch64MCTargetDesc.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vscode inserted it when using AArch64::G_UDOT
. But AArch64Subtarget.h
is already including MCTargetDesc/AArch64MCTargetDesc.h
.
@davemgreen @aemerson are we ready to merge? |
If you can merge in #95881 and rebase this on top then I think it looks good to me. |
a8af97c
to
0f8b0cb
Compare
@davemgreen I rebased on main and updated the tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. LGTM
…sing udot (llvm#96409) Follow up for llvm#95881 Use udot instead of a sequence of uaddlp instructions when summing up lanes for popcount.
…sing udot (llvm#96409) Follow up for llvm#95881 Use udot instead of a sequence of uaddlp instructions when summing up lanes for popcount.
Follow up for #95881
Use udot instead of a sequence of uaddlp instructions when summing up lanes for popcount.