Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV] Add additional fence for amocas when required by recent ABI change #101023

Merged
merged 4 commits into from
Sep 19, 2024

Conversation

asb
Copy link
Contributor

@asb asb commented Jul 29, 2024

A recent atomics ABI change / fix requires that for the "A6C" and A6S" atomics ABIs (i.e. both of those supported by LLVM currently), an additional fence is inserted for an atomic_compare_exchange with seq_cst failure ordering. riscv-non-isa/riscv-elf-psabi-doc#445

This isn't trivial to support through the hooks used by AtomicExpandPass because that pass assumes that when fences are inserted, the original atomics ordering information can be removed from the instruction. Rather than try to change and complicate that API, this patch implements the needed fence insertion in RISCVCodeGenPrepare.


Although it's not exactly elegant sticking this in RISCVCodeGenPrepare, it is I think the most straightforward way of implementing this new behaviour. CC @jyknight - if you have a preference for changing AtomicExpandPass to accommodate this and a clear idea of what interface changes would be appropriate, I wouldn't be opposed to implementing it.

@llvmbot
Copy link
Member

llvmbot commented Jul 29, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Alex Bradbury (asb)

Changes

A recent atomics ABI change / fix requires that for the "A6C" and A6S" atomics ABIs (i.e. both of those supported by LLVM currently), an additional fence is inserted for an atomic_compare_exchange with seq_cst failure ordering. <riscv-non-isa/riscv-elf-psabi-doc#445>

This isn't trivial to support through the hooks used by AtomicExpandPass because that pass assumes that when fences are inserted, the original atomics ordering information can be removed from the instruction. Rather than try to change and complicate that API, this patch implements the needed fence insertion in RISCVCodeGenPrepare.


Although it's not exactly elegant sticking this in RISCVCodeGenPrepare, it is I think the most straightforward way of implementing this new behaviour. CC @jyknight - if you have a preference for changing AtomicExpandPass to accommodate this and a clear idea of what interface changes would be appropriate, I wouldn't be opposed to implementing it.


Full diff: https://github.com/llvm/llvm-project/pull/101023.diff

6 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVCodeGenPrepare.cpp (+15)
  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.h (+4)
  • (modified) llvm/test/CodeGen/RISCV/atomic-cmpxchg-branch-on-result.ll (+5)
  • (modified) llvm/test/CodeGen/RISCV/atomic-cmpxchg.ll (+14)
  • (modified) llvm/test/CodeGen/RISCV/atomic-rmw.ll (+14)
  • (modified) llvm/test/CodeGen/RISCV/atomic-signext.ll (+2)
diff --git a/llvm/lib/Target/RISCV/RISCVCodeGenPrepare.cpp b/llvm/lib/Target/RISCV/RISCVCodeGenPrepare.cpp
index 0a66a38f6d5ab..8474c6b7c0889 100644
--- a/llvm/lib/Target/RISCV/RISCVCodeGenPrepare.cpp
+++ b/llvm/lib/Target/RISCV/RISCVCodeGenPrepare.cpp
@@ -10,6 +10,8 @@
 // It munges the code in the input function to better prepare it for
 // SelectionDAG-based code generation. This works around limitations in it's
 // basic-block-at-a-time approach.
+// It additionally implements a fence insertion for an atomic cmpxchg in a
+// case that isn't easy to do with the current AtomicExpandPass hooks API.
 //
 //===----------------------------------------------------------------------===//
 
@@ -59,6 +61,7 @@ class RISCVCodeGenPrepare : public FunctionPass,
   bool visitAnd(BinaryOperator &BO);
   bool visitIntrinsicInst(IntrinsicInst &I);
   bool expandVPStrideLoad(IntrinsicInst &I);
+  bool visitAtomicCmpXchgInst(AtomicCmpXchgInst &I);
 };
 
 } // end anonymous namespace
@@ -212,6 +215,18 @@ bool RISCVCodeGenPrepare::expandVPStrideLoad(IntrinsicInst &II) {
   return true;
 }
 
+// Insert a leading fence (needed for broadest atomics ABI compatibility)
+// only if the Zacas extension is enabled and the AtomicCmpXchgInst has a
+// SequentiallyConsistent failure ordering.
+bool RISCVCodeGenPrepare::visitAtomicCmpXchgInst(AtomicCmpXchgInst &I) {
+  IRBuilder<> Builder(&I);
+  if (!ST->hasStdExtZacas() || I.getFailureOrdering() != AtomicOrdering::SequentiallyConsistent)
+    return false;
+
+  Builder.CreateFence(AtomicOrdering::SequentiallyConsistent);
+  return true;
+}
+
 bool RISCVCodeGenPrepare::runOnFunction(Function &F) {
   if (skipFunction(F))
     return false;
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.h b/llvm/lib/Target/RISCV/RISCVISelLowering.h
index 498c77f1875ed..517eea32719b2 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.h
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.h
@@ -680,6 +680,10 @@ class RISCVTargetLowering : public TargetLowering {
 
   bool preferZeroCompareBranch() const override { return true; }
 
+  // Note that one specific case requires fence insertion for an
+  // AtomicCmpXchgInst but is handled via RISCVCodeGenPrepare rather than this
+  // hook due to limitations in the interface here (see RISCVCodeGenPrepare
+  // for more information).
   bool shouldInsertFencesForAtomic(const Instruction *I) const override {
     return isa<LoadInst>(I) || isa<StoreInst>(I);
   }
diff --git a/llvm/test/CodeGen/RISCV/atomic-cmpxchg-branch-on-result.ll b/llvm/test/CodeGen/RISCV/atomic-cmpxchg-branch-on-result.ll
index b98d2d57a0b52..e70ba93de75e0 100644
--- a/llvm/test/CodeGen/RISCV/atomic-cmpxchg-branch-on-result.ll
+++ b/llvm/test/CodeGen/RISCV/atomic-cmpxchg-branch-on-result.ll
@@ -36,6 +36,7 @@ define void @cmpxchg_and_branch1(ptr %ptr, i32 signext %cmp, i32 signext %val) n
 ; ZACAS:       # %bb.0: # %entry
 ; ZACAS-NEXT:  .LBB0_1: # %do_cmpxchg
 ; ZACAS-NEXT:    # =>This Inner Loop Header: Depth=1
+; ZACAS-NEXT:    fence rw, rw
 ; ZACAS-NEXT:    mv a3, a1
 ; ZACAS-NEXT:    amocas.w.aqrl a3, a2, (a0)
 ; ZACAS-NEXT:    bne a3, a1, .LBB0_1
@@ -76,6 +77,7 @@ define void @cmpxchg_and_branch2(ptr %ptr, i32 signext %cmp, i32 signext %val) n
 ; ZACAS:       # %bb.0: # %entry
 ; ZACAS-NEXT:  .LBB1_1: # %do_cmpxchg
 ; ZACAS-NEXT:    # =>This Inner Loop Header: Depth=1
+; ZACAS-NEXT:    fence rw, rw
 ; ZACAS-NEXT:    mv a3, a1
 ; ZACAS-NEXT:    amocas.w.aqrl a3, a2, (a0)
 ; ZACAS-NEXT:    beq a3, a1, .LBB1_1
@@ -216,6 +218,7 @@ define void @cmpxchg_masked_and_branch1(ptr %ptr, i8 signext %cmp, i8 signext %v
 ; RV64IA-ZABHA:       # %bb.0: # %entry
 ; RV64IA-ZABHA-NEXT:  .LBB2_1: # %do_cmpxchg
 ; RV64IA-ZABHA-NEXT:    # =>This Inner Loop Header: Depth=1
+; RV64IA-ZABHA-NEXT:    fence rw, rw
 ; RV64IA-ZABHA-NEXT:    mv a3, a1
 ; RV64IA-ZABHA-NEXT:    amocas.b.aqrl a3, a2, (a0)
 ; RV64IA-ZABHA-NEXT:    bne a3, a1, .LBB2_1
@@ -368,6 +371,7 @@ define void @cmpxchg_masked_and_branch2(ptr %ptr, i8 signext %cmp, i8 signext %v
 ; RV64IA-ZABHA:       # %bb.0: # %entry
 ; RV64IA-ZABHA-NEXT:  .LBB3_1: # %do_cmpxchg
 ; RV64IA-ZABHA-NEXT:    # =>This Inner Loop Header: Depth=1
+; RV64IA-ZABHA-NEXT:    fence rw, rw
 ; RV64IA-ZABHA-NEXT:    mv a3, a1
 ; RV64IA-ZABHA-NEXT:    amocas.b.aqrl a3, a2, (a0)
 ; RV64IA-ZABHA-NEXT:    beq a3, a1, .LBB3_1
@@ -408,6 +412,7 @@ define void @cmpxchg_and_irrelevant_branch(ptr %ptr, i32 signext %cmp, i32 signe
 ; ZACAS:       # %bb.0: # %entry
 ; ZACAS-NEXT:  .LBB4_1: # %do_cmpxchg
 ; ZACAS-NEXT:    # =>This Inner Loop Header: Depth=1
+; ZACAS-NEXT:    fence rw, rw
 ; ZACAS-NEXT:    mv a4, a1
 ; ZACAS-NEXT:    amocas.w.aqrl a4, a2, (a0)
 ; ZACAS-NEXT:    beqz a3, .LBB4_1
diff --git a/llvm/test/CodeGen/RISCV/atomic-cmpxchg.ll b/llvm/test/CodeGen/RISCV/atomic-cmpxchg.ll
index c47db319fc2c3..acd6e8f9afe2a 100644
--- a/llvm/test/CodeGen/RISCV/atomic-cmpxchg.ll
+++ b/llvm/test/CodeGen/RISCV/atomic-cmpxchg.ll
@@ -1857,6 +1857,7 @@ define void @cmpxchg_i8_seq_cst_seq_cst(ptr %ptr, i8 %cmp, i8 %val) nounwind {
 ;
 ; RV64IA-WMO-ZABHA-LABEL: cmpxchg_i8_seq_cst_seq_cst:
 ; RV64IA-WMO-ZABHA:       # %bb.0:
+; RV64IA-WMO-ZABHA-NEXT:    fence rw, rw
 ; RV64IA-WMO-ZABHA-NEXT:    amocas.b.aqrl a1, a2, (a0)
 ; RV64IA-WMO-ZABHA-NEXT:    ret
 ;
@@ -1885,6 +1886,7 @@ define void @cmpxchg_i8_seq_cst_seq_cst(ptr %ptr, i8 %cmp, i8 %val) nounwind {
 ;
 ; RV64IA-TSO-ZABHA-LABEL: cmpxchg_i8_seq_cst_seq_cst:
 ; RV64IA-TSO-ZABHA:       # %bb.0:
+; RV64IA-TSO-ZABHA-NEXT:    fence rw, rw
 ; RV64IA-TSO-ZABHA-NEXT:    amocas.b a1, a2, (a0)
 ; RV64IA-TSO-ZABHA-NEXT:    ret
   %res = cmpxchg ptr %ptr, i8 %cmp, i8 %val seq_cst seq_cst
@@ -3787,6 +3789,7 @@ define void @cmpxchg_i16_seq_cst_seq_cst(ptr %ptr, i16 %cmp, i16 %val) nounwind
 ;
 ; RV64IA-WMO-ZABHA-LABEL: cmpxchg_i16_seq_cst_seq_cst:
 ; RV64IA-WMO-ZABHA:       # %bb.0:
+; RV64IA-WMO-ZABHA-NEXT:    fence rw, rw
 ; RV64IA-WMO-ZABHA-NEXT:    amocas.h.aqrl a1, a2, (a0)
 ; RV64IA-WMO-ZABHA-NEXT:    ret
 ;
@@ -3816,6 +3819,7 @@ define void @cmpxchg_i16_seq_cst_seq_cst(ptr %ptr, i16 %cmp, i16 %val) nounwind
 ;
 ; RV64IA-TSO-ZABHA-LABEL: cmpxchg_i16_seq_cst_seq_cst:
 ; RV64IA-TSO-ZABHA:       # %bb.0:
+; RV64IA-TSO-ZABHA-NEXT:    fence rw, rw
 ; RV64IA-TSO-ZABHA-NEXT:    amocas.h a1, a2, (a0)
 ; RV64IA-TSO-ZABHA-NEXT:    ret
   %res = cmpxchg ptr %ptr, i16 %cmp, i16 %val seq_cst seq_cst
@@ -4788,6 +4792,7 @@ define void @cmpxchg_i32_seq_cst_seq_cst(ptr %ptr, i32 %cmp, i32 %val) nounwind
 ;
 ; RV32IA-WMO-ZACAS-LABEL: cmpxchg_i32_seq_cst_seq_cst:
 ; RV32IA-WMO-ZACAS:       # %bb.0:
+; RV32IA-WMO-ZACAS-NEXT:    fence rw, rw
 ; RV32IA-WMO-ZACAS-NEXT:    amocas.w.aqrl a1, a2, (a0)
 ; RV32IA-WMO-ZACAS-NEXT:    ret
 ;
@@ -4804,6 +4809,7 @@ define void @cmpxchg_i32_seq_cst_seq_cst(ptr %ptr, i32 %cmp, i32 %val) nounwind
 ;
 ; RV32IA-TSO-ZACAS-LABEL: cmpxchg_i32_seq_cst_seq_cst:
 ; RV32IA-TSO-ZACAS:       # %bb.0:
+; RV32IA-TSO-ZACAS-NEXT:    fence rw, rw
 ; RV32IA-TSO-ZACAS-NEXT:    amocas.w a1, a2, (a0)
 ; RV32IA-TSO-ZACAS-NEXT:    ret
 ;
@@ -4834,11 +4840,13 @@ define void @cmpxchg_i32_seq_cst_seq_cst(ptr %ptr, i32 %cmp, i32 %val) nounwind
 ;
 ; RV64IA-WMO-ZACAS-LABEL: cmpxchg_i32_seq_cst_seq_cst:
 ; RV64IA-WMO-ZACAS:       # %bb.0:
+; RV64IA-WMO-ZACAS-NEXT:    fence rw, rw
 ; RV64IA-WMO-ZACAS-NEXT:    amocas.w.aqrl a1, a2, (a0)
 ; RV64IA-WMO-ZACAS-NEXT:    ret
 ;
 ; RV64IA-WMO-ZABHA-LABEL: cmpxchg_i32_seq_cst_seq_cst:
 ; RV64IA-WMO-ZABHA:       # %bb.0:
+; RV64IA-WMO-ZABHA-NEXT:    fence rw, rw
 ; RV64IA-WMO-ZABHA-NEXT:    amocas.w.aqrl a1, a2, (a0)
 ; RV64IA-WMO-ZABHA-NEXT:    ret
 ;
@@ -4856,11 +4864,13 @@ define void @cmpxchg_i32_seq_cst_seq_cst(ptr %ptr, i32 %cmp, i32 %val) nounwind
 ;
 ; RV64IA-TSO-ZACAS-LABEL: cmpxchg_i32_seq_cst_seq_cst:
 ; RV64IA-TSO-ZACAS:       # %bb.0:
+; RV64IA-TSO-ZACAS-NEXT:    fence rw, rw
 ; RV64IA-TSO-ZACAS-NEXT:    amocas.w a1, a2, (a0)
 ; RV64IA-TSO-ZACAS-NEXT:    ret
 ;
 ; RV64IA-TSO-ZABHA-LABEL: cmpxchg_i32_seq_cst_seq_cst:
 ; RV64IA-TSO-ZABHA:       # %bb.0:
+; RV64IA-TSO-ZABHA-NEXT:    fence rw, rw
 ; RV64IA-TSO-ZABHA-NEXT:    amocas.w a1, a2, (a0)
 ; RV64IA-TSO-ZABHA-NEXT:    ret
   %res = cmpxchg ptr %ptr, i32 %cmp, i32 %val seq_cst seq_cst
@@ -5753,11 +5763,13 @@ define void @cmpxchg_i64_seq_cst_seq_cst(ptr %ptr, i64 %cmp, i64 %val) nounwind
 ;
 ; RV64IA-WMO-ZACAS-LABEL: cmpxchg_i64_seq_cst_seq_cst:
 ; RV64IA-WMO-ZACAS:       # %bb.0:
+; RV64IA-WMO-ZACAS-NEXT:    fence rw, rw
 ; RV64IA-WMO-ZACAS-NEXT:    amocas.d.aqrl a1, a2, (a0)
 ; RV64IA-WMO-ZACAS-NEXT:    ret
 ;
 ; RV64IA-WMO-ZABHA-LABEL: cmpxchg_i64_seq_cst_seq_cst:
 ; RV64IA-WMO-ZABHA:       # %bb.0:
+; RV64IA-WMO-ZABHA-NEXT:    fence rw, rw
 ; RV64IA-WMO-ZABHA-NEXT:    amocas.d.aqrl a1, a2, (a0)
 ; RV64IA-WMO-ZABHA-NEXT:    ret
 ;
@@ -5774,11 +5786,13 @@ define void @cmpxchg_i64_seq_cst_seq_cst(ptr %ptr, i64 %cmp, i64 %val) nounwind
 ;
 ; RV64IA-TSO-ZACAS-LABEL: cmpxchg_i64_seq_cst_seq_cst:
 ; RV64IA-TSO-ZACAS:       # %bb.0:
+; RV64IA-TSO-ZACAS-NEXT:    fence rw, rw
 ; RV64IA-TSO-ZACAS-NEXT:    amocas.d a1, a2, (a0)
 ; RV64IA-TSO-ZACAS-NEXT:    ret
 ;
 ; RV64IA-TSO-ZABHA-LABEL: cmpxchg_i64_seq_cst_seq_cst:
 ; RV64IA-TSO-ZABHA:       # %bb.0:
+; RV64IA-TSO-ZABHA-NEXT:    fence rw, rw
 ; RV64IA-TSO-ZABHA-NEXT:    amocas.d a1, a2, (a0)
 ; RV64IA-TSO-ZABHA-NEXT:    ret
   %res = cmpxchg ptr %ptr, i64 %cmp, i64 %val seq_cst seq_cst
diff --git a/llvm/test/CodeGen/RISCV/atomic-rmw.ll b/llvm/test/CodeGen/RISCV/atomic-rmw.ll
index 4223440b9cb88..03157e13bff78 100644
--- a/llvm/test/CodeGen/RISCV/atomic-rmw.ll
+++ b/llvm/test/CodeGen/RISCV/atomic-rmw.ll
@@ -4437,6 +4437,7 @@ define i8 @atomicrmw_nand_i8_seq_cst(ptr %a, i8 %b) nounwind {
 ; RV64IA-WMO-ZABHA-ZACAS-NEXT:    # =>This Inner Loop Header: Depth=1
 ; RV64IA-WMO-ZABHA-ZACAS-NEXT:    and a3, a0, a1
 ; RV64IA-WMO-ZABHA-ZACAS-NEXT:    not a3, a3
+; RV64IA-WMO-ZABHA-ZACAS-NEXT:    fence rw, rw
 ; RV64IA-WMO-ZABHA-ZACAS-NEXT:    slli a4, a0, 56
 ; RV64IA-WMO-ZABHA-ZACAS-NEXT:    amocas.b.aqrl a0, a3, (a2)
 ; RV64IA-WMO-ZABHA-ZACAS-NEXT:    srai a4, a4, 56
@@ -4452,6 +4453,7 @@ define i8 @atomicrmw_nand_i8_seq_cst(ptr %a, i8 %b) nounwind {
 ; RV64IA-TSO-ZABHA-ZACAS-NEXT:    # =>This Inner Loop Header: Depth=1
 ; RV64IA-TSO-ZABHA-ZACAS-NEXT:    and a3, a0, a1
 ; RV64IA-TSO-ZABHA-ZACAS-NEXT:    not a3, a3
+; RV64IA-TSO-ZABHA-ZACAS-NEXT:    fence rw, rw
 ; RV64IA-TSO-ZABHA-ZACAS-NEXT:    slli a4, a0, 56
 ; RV64IA-TSO-ZABHA-ZACAS-NEXT:    amocas.b a0, a3, (a2)
 ; RV64IA-TSO-ZABHA-ZACAS-NEXT:    srai a4, a4, 56
@@ -14410,6 +14412,7 @@ define i16 @atomicrmw_nand_i16_seq_cst(ptr %a, i16 %b) nounwind {
 ; RV64IA-WMO-ZABHA-ZACAS-NEXT:    # =>This Inner Loop Header: Depth=1
 ; RV64IA-WMO-ZABHA-ZACAS-NEXT:    and a3, a0, a1
 ; RV64IA-WMO-ZABHA-ZACAS-NEXT:    not a3, a3
+; RV64IA-WMO-ZABHA-ZACAS-NEXT:    fence rw, rw
 ; RV64IA-WMO-ZABHA-ZACAS-NEXT:    slli a4, a0, 48
 ; RV64IA-WMO-ZABHA-ZACAS-NEXT:    amocas.h.aqrl a0, a3, (a2)
 ; RV64IA-WMO-ZABHA-ZACAS-NEXT:    srai a4, a4, 48
@@ -14425,6 +14428,7 @@ define i16 @atomicrmw_nand_i16_seq_cst(ptr %a, i16 %b) nounwind {
 ; RV64IA-TSO-ZABHA-ZACAS-NEXT:    # =>This Inner Loop Header: Depth=1
 ; RV64IA-TSO-ZABHA-ZACAS-NEXT:    and a3, a0, a1
 ; RV64IA-TSO-ZABHA-ZACAS-NEXT:    not a3, a3
+; RV64IA-TSO-ZABHA-ZACAS-NEXT:    fence rw, rw
 ; RV64IA-TSO-ZABHA-ZACAS-NEXT:    slli a4, a0, 48
 ; RV64IA-TSO-ZABHA-ZACAS-NEXT:    amocas.h a0, a3, (a2)
 ; RV64IA-TSO-ZABHA-ZACAS-NEXT:    srai a4, a4, 48
@@ -21637,6 +21641,7 @@ define i32 @atomicrmw_nand_i32_seq_cst(ptr %a, i32 %b) nounwind {
 ; RV32IA-WMO-ZACAS-NEXT:    mv a3, a0
 ; RV32IA-WMO-ZACAS-NEXT:    and a4, a0, a1
 ; RV32IA-WMO-ZACAS-NEXT:    not a4, a4
+; RV32IA-WMO-ZACAS-NEXT:    fence rw, rw
 ; RV32IA-WMO-ZACAS-NEXT:    amocas.w.aqrl a0, a4, (a2)
 ; RV32IA-WMO-ZACAS-NEXT:    bne a0, a3, .LBB154_1
 ; RV32IA-WMO-ZACAS-NEXT:  # %bb.2: # %atomicrmw.end
@@ -21651,6 +21656,7 @@ define i32 @atomicrmw_nand_i32_seq_cst(ptr %a, i32 %b) nounwind {
 ; RV32IA-TSO-ZACAS-NEXT:    mv a3, a0
 ; RV32IA-TSO-ZACAS-NEXT:    and a4, a0, a1
 ; RV32IA-TSO-ZACAS-NEXT:    not a4, a4
+; RV32IA-TSO-ZACAS-NEXT:    fence rw, rw
 ; RV32IA-TSO-ZACAS-NEXT:    amocas.w a0, a4, (a2)
 ; RV32IA-TSO-ZACAS-NEXT:    bne a0, a3, .LBB154_1
 ; RV32IA-TSO-ZACAS-NEXT:  # %bb.2: # %atomicrmw.end
@@ -21665,6 +21671,7 @@ define i32 @atomicrmw_nand_i32_seq_cst(ptr %a, i32 %b) nounwind {
 ; RV64IA-WMO-ZACAS-NEXT:    mv a3, a0
 ; RV64IA-WMO-ZACAS-NEXT:    and a4, a0, a1
 ; RV64IA-WMO-ZACAS-NEXT:    not a4, a4
+; RV64IA-WMO-ZACAS-NEXT:    fence rw, rw
 ; RV64IA-WMO-ZACAS-NEXT:    amocas.w.aqrl a0, a4, (a2)
 ; RV64IA-WMO-ZACAS-NEXT:    bne a0, a3, .LBB154_1
 ; RV64IA-WMO-ZACAS-NEXT:  # %bb.2: # %atomicrmw.end
@@ -21679,6 +21686,7 @@ define i32 @atomicrmw_nand_i32_seq_cst(ptr %a, i32 %b) nounwind {
 ; RV64IA-TSO-ZACAS-NEXT:    mv a3, a0
 ; RV64IA-TSO-ZACAS-NEXT:    and a4, a0, a1
 ; RV64IA-TSO-ZACAS-NEXT:    not a4, a4
+; RV64IA-TSO-ZACAS-NEXT:    fence rw, rw
 ; RV64IA-TSO-ZACAS-NEXT:    amocas.w a0, a4, (a2)
 ; RV64IA-TSO-ZACAS-NEXT:    bne a0, a3, .LBB154_1
 ; RV64IA-TSO-ZACAS-NEXT:  # %bb.2: # %atomicrmw.end
@@ -21717,6 +21725,7 @@ define i32 @atomicrmw_nand_i32_seq_cst(ptr %a, i32 %b) nounwind {
 ; RV64IA-WMO-ZABHA-ZACAS-NEXT:    mv a3, a0
 ; RV64IA-WMO-ZABHA-ZACAS-NEXT:    and a4, a0, a1
 ; RV64IA-WMO-ZABHA-ZACAS-NEXT:    not a4, a4
+; RV64IA-WMO-ZABHA-ZACAS-NEXT:    fence rw, rw
 ; RV64IA-WMO-ZABHA-ZACAS-NEXT:    amocas.w.aqrl a0, a4, (a2)
 ; RV64IA-WMO-ZABHA-ZACAS-NEXT:    bne a0, a3, .LBB154_1
 ; RV64IA-WMO-ZABHA-ZACAS-NEXT:  # %bb.2: # %atomicrmw.end
@@ -21731,6 +21740,7 @@ define i32 @atomicrmw_nand_i32_seq_cst(ptr %a, i32 %b) nounwind {
 ; RV64IA-TSO-ZABHA-ZACAS-NEXT:    mv a3, a0
 ; RV64IA-TSO-ZABHA-ZACAS-NEXT:    and a4, a0, a1
 ; RV64IA-TSO-ZABHA-ZACAS-NEXT:    not a4, a4
+; RV64IA-TSO-ZABHA-ZACAS-NEXT:    fence rw, rw
 ; RV64IA-TSO-ZABHA-ZACAS-NEXT:    amocas.w a0, a4, (a2)
 ; RV64IA-TSO-ZABHA-ZACAS-NEXT:    bne a0, a3, .LBB154_1
 ; RV64IA-TSO-ZABHA-ZACAS-NEXT:  # %bb.2: # %atomicrmw.end
@@ -25546,6 +25556,7 @@ define i64 @atomicrmw_nand_i64_seq_cst(ptr %a, i64 %b) nounwind {
 ; RV64IA-WMO-ZACAS-NEXT:    mv a3, a0
 ; RV64IA-WMO-ZACAS-NEXT:    and a4, a0, a1
 ; RV64IA-WMO-ZACAS-NEXT:    not a4, a4
+; RV64IA-WMO-ZACAS-NEXT:    fence rw, rw
 ; RV64IA-WMO-ZACAS-NEXT:    amocas.d.aqrl a0, a4, (a2)
 ; RV64IA-WMO-ZACAS-NEXT:    bne a0, a3, .LBB209_1
 ; RV64IA-WMO-ZACAS-NEXT:  # %bb.2: # %atomicrmw.end
@@ -25560,6 +25571,7 @@ define i64 @atomicrmw_nand_i64_seq_cst(ptr %a, i64 %b) nounwind {
 ; RV64IA-TSO-ZACAS-NEXT:    mv a3, a0
 ; RV64IA-TSO-ZACAS-NEXT:    and a4, a0, a1
 ; RV64IA-TSO-ZACAS-NEXT:    not a4, a4
+; RV64IA-TSO-ZACAS-NEXT:    fence rw, rw
 ; RV64IA-TSO-ZACAS-NEXT:    amocas.d a0, a4, (a2)
 ; RV64IA-TSO-ZACAS-NEXT:    bne a0, a3, .LBB209_1
 ; RV64IA-TSO-ZACAS-NEXT:  # %bb.2: # %atomicrmw.end
@@ -25598,6 +25610,7 @@ define i64 @atomicrmw_nand_i64_seq_cst(ptr %a, i64 %b) nounwind {
 ; RV64IA-WMO-ZABHA-ZACAS-NEXT:    mv a3, a0
 ; RV64IA-WMO-ZABHA-ZACAS-NEXT:    and a4, a0, a1
 ; RV64IA-WMO-ZABHA-ZACAS-NEXT:    not a4, a4
+; RV64IA-WMO-ZABHA-ZACAS-NEXT:    fence rw, rw
 ; RV64IA-WMO-ZABHA-ZACAS-NEXT:    amocas.d.aqrl a0, a4, (a2)
 ; RV64IA-WMO-ZABHA-ZACAS-NEXT:    bne a0, a3, .LBB209_1
 ; RV64IA-WMO-ZABHA-ZACAS-NEXT:  # %bb.2: # %atomicrmw.end
@@ -25612,6 +25625,7 @@ define i64 @atomicrmw_nand_i64_seq_cst(ptr %a, i64 %b) nounwind {
 ; RV64IA-TSO-ZABHA-ZACAS-NEXT:    mv a3, a0
 ; RV64IA-TSO-ZABHA-ZACAS-NEXT:    and a4, a0, a1
 ; RV64IA-TSO-ZABHA-ZACAS-NEXT:    not a4, a4
+; RV64IA-TSO-ZABHA-ZACAS-NEXT:    fence rw, rw
 ; RV64IA-TSO-ZABHA-ZACAS-NEXT:    amocas.d a0, a4, (a2)
 ; RV64IA-TSO-ZABHA-ZACAS-NEXT:    bne a0, a3, .LBB209_1
 ; RV64IA-TSO-ZABHA-ZACAS-NEXT:  # %bb.2: # %atomicrmw.end
diff --git a/llvm/test/CodeGen/RISCV/atomic-signext.ll b/llvm/test/CodeGen/RISCV/atomic-signext.ll
index 775c17c3ceb3f..c143be478948e 100644
--- a/llvm/test/CodeGen/RISCV/atomic-signext.ll
+++ b/llvm/test/CodeGen/RISCV/atomic-signext.ll
@@ -5562,6 +5562,7 @@ define signext i32 @cmpxchg_i32_monotonic_crossbb(ptr %ptr, i32 signext %cmp, i3
 ; RV32IA-ZACAS:       # %bb.0:
 ; RV32IA-ZACAS-NEXT:    beqz a3, .LBB64_2
 ; RV32IA-ZACAS-NEXT:  # %bb.1: # %then
+; RV32IA-ZACAS-NEXT:    fence rw, rw
 ; RV32IA-ZACAS-NEXT:    amocas.w.aqrl a1, a2, (a0)
 ; RV32IA-ZACAS-NEXT:    mv a0, a1
 ; RV32IA-ZACAS-NEXT:    ret
@@ -5612,6 +5613,7 @@ define signext i32 @cmpxchg_i32_monotonic_crossbb(ptr %ptr, i32 signext %cmp, i3
 ; RV64IA-ZACAS:       # %bb.0:
 ; RV64IA-ZACAS-NEXT:    beqz a3, .LBB64_2
 ; RV64IA-ZACAS-NEXT:  # %bb.1: # %then
+; RV64IA-ZACAS-NEXT:    fence rw, rw
 ; RV64IA-ZACAS-NEXT:    amocas.w.aqrl a1, a2, (a0)
 ; RV64IA-ZACAS-NEXT:    mv a0, a1
 ; RV64IA-ZACAS-NEXT:    ret

…hange

A recent atomics ABI change / fix requiresthat for the "A6C" and A6S"
atomics ABIs (i.e. both of those supported by LLVM currently), an
additional fence is inserted for an atomic_compare_exchange with seq_cst
failure ordering.
<riscv-non-isa/riscv-elf-psabi-doc#445>

This isn't trivial to support through the hooks used by AtomicExpandPass
because that pass assumes that when fences are inserted, the original
atomics ordering information can be removed from the instruction. Rather
than try to change and complicate that API, this patch implements the
needed fence insertion in RISCVCodeGenPrepare.
@asb asb force-pushed the 2024q3-riscv-zacas-abi-change-take-2 branch from a04fe09 to 553b2a9 Compare July 29, 2024 14:49
@jyknight
Copy link
Member

Perhaps we could change the generic fence-insertion interface to enable this in the generic interface, but I'm OK with just doing this for now.

This does seem unfortunate for performance, though -- I hope someone's planning to implement the proper "A7" atomic abi mappings in the Clang 20 timeframe, so that these extraneous fences can go away?

@asb
Copy link
Contributor Author

asb commented Aug 14, 2024

This does seem unfortunate for performance, though -- I hope someone's planning to implement the proper "A7" atomic abi mappings in the Clang 20 timeframe, so that these extraneous fences can go away?

I agree it would be nice to have this, even if rolling out the A7 mappings in standard Linux distros is hard to envisage due to ABI compatibility. All the infrastructure in terms of ELF ABI targets etc is there.

Ping for anyone to give a LGTM so I can get this merged. Many thanks.

Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@patrick-rivos
Copy link
Contributor

@anparri
Copy link

anparri commented Aug 22, 2024

The resulting mappings looked good to me. Tested-by: Andrea Parri andrea@rivosinc.com :-)

// SequentiallyConsistent failure ordering.
bool RISCVCodeGenPrepare::visitAtomicCmpXchgInst(AtomicCmpXchgInst &I) {
IRBuilder<> Builder(&I);
if (!ST->hasStdExtZacas() ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we ever reach here with an AtomicCmpXchgInst that won't be lowered to an AMOCAS, e.g. if it's oversized?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that's cmpxchg_i64_seq_cst_seq_cst, and this works because AtomicExpandPass is in addIRPasses not addCodeGenPrepare so comes first and we thus only expect legal cmpxchg instructions here?

@topperc
Copy link
Collaborator

topperc commented Aug 22, 2024

RISCVCodeGenPrepare doesn't run at -O0. Is that a problem?

@asb
Copy link
Contributor Author

asb commented Sep 11, 2024

RISCVCodeGenPrepare doesn't run at -O0. Is that a problem?

Good catch, that is indeed a problem. I've pushed a patch that naively just moves this to a minimal separate pass that is guaranteed to run even at O0. What do you think?

llvm/lib/Target/RISCV/RISCVISelLowering.h Outdated Show resolved Hide resolved
// SequentiallyConsistent failure ordering.
bool RISCVZacasABIFix::visitAtomicCmpXchgInst(AtomicCmpXchgInst &I) {
IRBuilder<> Builder(&I);
if (!ST->hasStdExtZacas() ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zacas check can be done in runOnFunction to avoid the loop when it isn't enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved the check and added an assert.

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@asb asb merged commit 0ee10e9 into llvm:main Sep 19, 2024
8 checks passed
tmsri pushed a commit to tmsri/llvm-project that referenced this pull request Sep 19, 2024
…hange (llvm#101023)

A recent atomics ABI change / fix requires that for the "A6C" and A6S"
atomics ABIs (i.e. both of those supported by LLVM currently), an
additional fence is inserted for an atomic_compare_exchange with seq_cst
failure ordering.
<riscv-non-isa/riscv-elf-psabi-doc#445>

This isn't trivial to support through the hooks used by AtomicExpandPass
because that pass assumes that when fences are inserted, the original
atomics ordering information can be removed from the instruction. Rather
than try to change and complicate that API, this patch implements the
needed fence insertion through a small special purpose pass.
@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 20, 2024

LLVM Buildbot has detected a new failure on builder lld-x86_64-win running on as-worker-93 while building llvm at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/146/builds/1196

Here is the relevant piece of the build log for the reference
Step 7 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM-Unit :: Support/./SupportTests.exe/36/87' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe-LLVM-Unit-10684-36-87.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=87 GTEST_SHARD_INDEX=36 C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe
--

Script:
--
C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe --gtest_filter=ProgramEnvTest.CreateProcessLongPath
--
C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp(160): error: Expected equality of these values:
  0
  RC
    Which is: -2

C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp(163): error: fs::remove(Twine(LongPath)): did not return errc::success.
error number: 13
error message: permission denied



C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp:160
Expected equality of these values:
  0
  RC
    Which is: -2

C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp:163
fs::remove(Twine(LongPath)): did not return errc::success.
error number: 13
error message: permission denied




********************


asb added a commit that referenced this pull request Sep 25, 2024
The extension has been ratified for some time, but we kept it
experimental (see #99898) due to
<riscv-non-isa/riscv-elf-psabi-doc#444>. The
ABI issue has been resolved by #101023 so I believe there's no known
barrier to moving Zacas to non-experimental.
augusto2112 pushed a commit to augusto2112/llvm-project that referenced this pull request Sep 26, 2024
The extension has been ratified for some time, but we kept it
experimental (see llvm#99898) due to
<riscv-non-isa/riscv-elf-psabi-doc#444>. The
ABI issue has been resolved by llvm#101023 so I believe there's no known
barrier to moving Zacas to non-experimental.
xgupta pushed a commit to xgupta/llvm-project that referenced this pull request Oct 4, 2024
The extension has been ratified for some time, but we kept it
experimental (see llvm#99898) due to
<riscv-non-isa/riscv-elf-psabi-doc#444>. The
ABI issue has been resolved by llvm#101023 so I believe there's no known
barrier to moving Zacas to non-experimental.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants