Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64][GlobalISel] Improve non-SVE popcount for 32bit and 64 bit using udot #96409

Merged
merged 2 commits into from
Jul 2, 2024

Conversation

tgymnich
Copy link
Member

@tgymnich tgymnich commented Jun 22, 2024

Follow up for #95881

Use udot instead of a sequence of uaddlp instructions when summing up lanes for popcount.

@tgymnich tgymnich marked this pull request as draft June 22, 2024 22:41
@llvmbot
Copy link
Collaborator

llvmbot commented Jun 22, 2024

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-aarch64

Author: Tim Gymnich (tgymnich)

Changes

Follow up for #95881

Use udot instead of a sequence of uaddlp instructions when summing up lanes for popcount.

TODO:

  • add/update test

Full diff: https://github.com/llvm/llvm-project/pull/96409.diff

1 Files Affected:

  • (modified) llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp (+26)
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index fef0b722efe45..84cdc40bc5a3b 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -14,6 +14,7 @@
 #include "AArch64LegalizerInfo.h"
 #include "AArch64RegisterBankInfo.h"
 #include "AArch64Subtarget.h"
+#include "MCTargetDesc/AArch64MCTargetDesc.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/CodeGen/GlobalISel/GenericMachineInstrs.h"
 #include "llvm/CodeGen/GlobalISel/LegalizerHelper.h"
@@ -1904,6 +1905,31 @@ bool AArch64LegalizerInfo::legalizeCTPOP(MachineInstr &MI,
   auto CTPOP = MIRBuilder.buildCTPOP(VTy, Val);
 
   // Sum across lanes.
+
+  if (ST->hasDotProd() && Ty.getNumElements() >= 2 &&
+      Ty.getScalarSizeInBits() != 16) {
+    LLT Dt = Ty == LLT::fixed_vector(2, 64) ? LLT::fixed_vector(4, 32) : Ty;
+    auto Zeros = MIRBuilder.buildConstant(Ty, 0);
+    auto Ones = MIRBuilder.buildConstant(VTy, 1);
+    MachineInstrBuilder SUM;
+
+    if (Ty == LLT::fixed_vector(2, 64)) {
+      auto UDOT =
+          MIRBuilder.buildInstr(AArch64::G_UDOT, {Dt}, {Zeros, Ones, CTPOP});
+      SUM = MIRBuilder.buildInstr(AArch64::G_UADDLP, {Ty}, {UDOT});
+    } else if (Ty == LLT::fixed_vector(4, 32)) {
+      SUM = MIRBuilder.buildInstr(AArch64::G_UDOT, {Dt}, {Zeros, Ones, CTPOP});
+    } else if (Ty == LLT::fixed_vector(2, 32)) {
+      SUM = MIRBuilder.buildInstr(AArch64::G_UDOT, {Dt}, {Zeros, Ones, CTPOP});
+    } else {
+      llvm_unreachable("unexpected vector shape");
+    }
+
+    SUM->getOperand(0).setReg(Dst);
+    MI.eraseFromParent();
+    return true;
+  }
+
   Register HSum = CTPOP.getReg(0);
   unsigned Opc;
   SmallVector<LLT> HAddTys;

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, thanks. Can we add global-isel run lines to the existing tests?

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like the tests will need to be rebased once the other patch is committed, but this looks OK to me.

llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp Outdated Show resolved Hide resolved
@@ -14,6 +14,7 @@
#include "AArch64LegalizerInfo.h"
#include "AArch64RegisterBankInfo.h"
#include "AArch64Subtarget.h"
#include "MCTargetDesc/AArch64MCTargetDesc.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed?

Copy link
Member Author

@tgymnich tgymnich Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vscode inserted it when using AArch64::G_UDOT. But AArch64Subtarget.h is already including MCTargetDesc/AArch64MCTargetDesc.h.

llvm/test/CodeGen/AArch64/popcount.ll Outdated Show resolved Hide resolved
@tgymnich tgymnich marked this pull request as ready for review June 27, 2024 12:00
@tgymnich
Copy link
Member Author

tgymnich commented Jul 2, 2024

@davemgreen @aemerson are we ready to merge?

@davemgreen
Copy link
Collaborator

If you can merge in #95881 and rebase this on top then I think it looks good to me.

@tgymnich
Copy link
Member Author

tgymnich commented Jul 2, 2024

@davemgreen I rebased on main and updated the tests.

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. LGTM

@davemgreen davemgreen merged commit 2ee86a1 into llvm:main Jul 2, 2024
7 checks passed
lravenclaw pushed a commit to lravenclaw/llvm-project that referenced this pull request Jul 3, 2024
…sing udot (llvm#96409)

Follow up for llvm#95881

Use udot instead of a sequence of uaddlp instructions when summing up
lanes for popcount.
kbluck pushed a commit to kbluck/llvm-project that referenced this pull request Jul 6, 2024
…sing udot (llvm#96409)

Follow up for llvm#95881

Use udot instead of a sequence of uaddlp instructions when summing up
lanes for popcount.
@tgymnich tgymnich deleted the arm-ctpop-udot-globalisel branch October 9, 2024 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants