Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAG: Replace bitwidth with type in suffix in atomic tablegen ops #94845

Merged
merged 3 commits into from
Jun 13, 2024

Conversation

arsenm
Copy link
Contributor

@arsenm arsenm commented Jun 8, 2024

For FP atomics involving bfloat vs. half, we need to distinguish the type and not rely on the bitwidth alone. For my purposes, an alternative would be to relax the atomic predicate MemoryVT pattern check with a memory size only check. Since there are no extending operations involved, the pattern value check should be unambiguous.

For some reason when using the _32 variants for atomicrmw fadd, I was able to select v2f16 but v2bf16 would fail.

Changes mostly done with sed, e.g.
sed -E -i -r 's/atomic_load_(add|swap|sub|and|clr|or|xor|nand|min|max|umin|umax|swap)([0-9]+)/atomic_load\1_i\2/' llvm/lib/Target//.td

@llvmbot
Copy link
Collaborator

llvmbot commented Jun 8, 2024

@llvm/pr-subscribers-backend-webassembly
@llvm/pr-subscribers-backend-powerpc
@llvm/pr-subscribers-backend-x86
@llvm/pr-subscribers-backend-systemz
@llvm/pr-subscribers-backend-sparc
@llvm/pr-subscribers-backend-loongarch
@llvm/pr-subscribers-backend-nvptx
@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-llvm-selectiondag

Author: Matt Arsenault (arsenm)

Changes

For FP atomics involving bfloat vs. half, we need to distinguish the type and not rely on the bitwidth alone. For my purposes, an alternative would be to relax the atomic predicate MemoryVT pattern check with a memory size only check. Since there are no extending operations involved, the pattern value check should be unambiguous.

For some reason when using the _32 variants for atomicrmw fadd, I was able to select v2f16 but v2bf16 would fail.

Changes mostly done with sed, e.g.
sed -E -i -r 's/atomic_load_(add|swap|sub|and|clr|or|xor|nand|min|max|umin|umax|swap)([0-9]+)/atomic_load\1_i\2/' llvm/lib/Target//.td


Patch is 127.50 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/94845.diff

27 Files Affected:

  • (modified) llvm/include/llvm/Target/TargetSelectionDAG.td (+26-48)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrFormats.td (+30-30)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrGISel.td (+4-4)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUInstructions.td (+23-8)
  • (modified) llvm/lib/Target/AMDGPU/BUFInstructions.td (+2-2)
  • (modified) llvm/lib/Target/AMDGPU/DSInstructions.td (+23-23)
  • (modified) llvm/lib/Target/AMDGPU/EvergreenInstructions.td (+21-21)
  • (modified) llvm/lib/Target/AMDGPU/FLATInstructions.td (+6-6)
  • (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.td (+17-8)
  • (modified) llvm/lib/Target/AVR/AVRInstrInfo.td (+10-10)
  • (modified) llvm/lib/Target/BPF/BPFInstrInfo.td (+15-15)
  • (modified) llvm/lib/Target/LoongArch/LoongArchInstrInfo.td (+32-32)
  • (modified) llvm/lib/Target/Mips/Mips64InstrInfo.td (+12-12)
  • (modified) llvm/lib/Target/Mips/MipsInstrInfo.td (+39-39)
  • (modified) llvm/lib/Target/NVPTX/NVPTXIntrinsics.td (+221-221)
  • (modified) llvm/lib/Target/PowerPC/PPCInstr64Bit.td (+17-17)
  • (modified) llvm/lib/Target/PowerPC/PPCInstrInfo.td (+37-37)
  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfoA.td (+41-41)
  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfoZa.td (+23-23)
  • (modified) llvm/lib/Target/Sparc/SparcInstr64Bit.td (+1-1)
  • (modified) llvm/lib/Target/Sparc/SparcInstrInfo.td (+4-4)
  • (modified) llvm/lib/Target/SystemZ/SystemZInstrInfo.td (+8-8)
  • (modified) llvm/lib/Target/VE/VEInstrInfo.td (+4-4)
  • (modified) llvm/lib/Target/WebAssembly/WebAssemblyInstrAtomics.td (+23-23)
  • (modified) llvm/lib/Target/X86/X86InstrCompiler.td (+11-11)
  • (modified) llvm/lib/Target/X86/X86InstrMisc.td (+5-5)
  • (modified) llvm/test/TableGen/HasNoUse.td (+3-3)
diff --git a/llvm/include/llvm/Target/TargetSelectionDAG.td b/llvm/include/llvm/Target/TargetSelectionDAG.td
index 15e02eb49271d..f15737e95f97c 100644
--- a/llvm/include/llvm/Target/TargetSelectionDAG.td
+++ b/llvm/include/llvm/Target/TargetSelectionDAG.td
@@ -1680,60 +1680,38 @@ multiclass ternary_atomic_op_ord {
   }
 }
 
-multiclass binary_atomic_op<SDNode atomic_op, bit IsInt = 1> {
-  def _8 : PatFrag<(ops node:$ptr, node:$val),
-                   (atomic_op  node:$ptr, node:$val)> {
-    let IsAtomic = true;
-    let MemoryVT = !if(IsInt, i8, ?);
-  }
-  def _16 : PatFrag<(ops node:$ptr, node:$val),
-                    (atomic_op node:$ptr, node:$val)> {
-    let IsAtomic = true;
-    let MemoryVT = !if(IsInt, i16, f16);
-  }
-  def _32 : PatFrag<(ops node:$ptr, node:$val),
-                    (atomic_op node:$ptr, node:$val)> {
-    let IsAtomic = true;
-    let MemoryVT = !if(IsInt, i32, f32);
-  }
-  def _64 : PatFrag<(ops node:$ptr, node:$val),
-                    (atomic_op node:$ptr, node:$val)> {
-    let IsAtomic = true;
-    let MemoryVT = !if(IsInt, i64, f64);
+multiclass binary_atomic_op<SDNode atomic_op> {
+  foreach vt = [ i8, i16, i32, i64 ] in {
+    def _#vt : PatFrag<(ops node:$ptr, node:$val),
+                     (atomic_op  node:$ptr, node:$val)> {
+      let IsAtomic = true;
+      let MemoryVT = vt;
+    }
+
+    defm NAME#_#vt  : binary_atomic_op_ord;
   }
+}
 
-  defm NAME#_8  : binary_atomic_op_ord;
-  defm NAME#_16 : binary_atomic_op_ord;
-  defm NAME#_32 : binary_atomic_op_ord;
-  defm NAME#_64 : binary_atomic_op_ord;
+multiclass binary_atomic_op_fp<SDNode atomic_op> {
+  foreach vt = [ f16, bf16, v2f16, v2bf16, f32, f64 ] in {
+    def _#vt : PatFrag<(ops node:$ptr, node:$val),
+                      (atomic_op node:$ptr, node:$val)> {
+      let IsAtomic = true;
+      let MemoryVT = vt;
+    }
+  }
 }
 
 multiclass ternary_atomic_op<SDNode atomic_op> {
-  def _8 : PatFrag<(ops node:$ptr, node:$cmp, node:$val),
-                   (atomic_op  node:$ptr, node:$cmp, node:$val)> {
-    let IsAtomic = true;
-    let MemoryVT = i8;
-  }
-  def _16 : PatFrag<(ops node:$ptr, node:$cmp, node:$val),
-                    (atomic_op node:$ptr, node:$cmp, node:$val)> {
-    let IsAtomic = true;
-    let MemoryVT = i16;
-  }
-  def _32 : PatFrag<(ops node:$ptr, node:$cmp, node:$val),
-                    (atomic_op node:$ptr, node:$cmp, node:$val)> {
-    let IsAtomic = true;
-    let MemoryVT = i32;
-  }
-  def _64 : PatFrag<(ops node:$ptr, node:$cmp, node:$val),
-                    (atomic_op node:$ptr, node:$cmp, node:$val)> {
-    let IsAtomic = true;
-    let MemoryVT = i64;
+  foreach vt = [ i8, i16, i32, i64 ] in {
+    def _#vt : PatFrag<(ops node:$ptr, node:$cmp, node:$val),
+                     (atomic_op node:$ptr, node:$cmp, node:$val)> {
+      let IsAtomic = true;
+      let MemoryVT = vt;
+    }
+
+    defm NAME#_#vt  : ternary_atomic_op_ord;
   }
-
-  defm NAME#_8  : ternary_atomic_op_ord;
-  defm NAME#_16 : ternary_atomic_op_ord;
-  defm NAME#_32 : ternary_atomic_op_ord;
-  defm NAME#_64 : ternary_atomic_op_ord;
 }
 
 defm atomic_load_add  : binary_atomic_op<atomic_load_add>;
diff --git a/llvm/lib/Target/AArch64/AArch64InstrFormats.td b/llvm/lib/Target/AArch64/AArch64InstrFormats.td
index 1f437d0ed6f8d..17d011086634c 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrFormats.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrFormats.td
@@ -11887,79 +11887,79 @@ multiclass LDOPregister<bits<3> opc, string op, bits<1> Acq, bits<1> Rel,
 // complex DAG for DstRHS.
 let Predicates = [HasLSE] in
 multiclass LDOPregister_patterns_ord_dag<string inst, string suffix, string op,
-                                         string size, dag SrcRHS, dag DstRHS> {
-  def : Pat<(!cast<PatFrag>(op#"_"#size#"_monotonic") GPR64sp:$Rn, SrcRHS),
+                                         ValueType vt, dag SrcRHS, dag DstRHS> {
+  def : Pat<(!cast<PatFrag>(op#"_"#vt#"_monotonic") GPR64sp:$Rn, SrcRHS),
             (!cast<Instruction>(inst # suffix) DstRHS, GPR64sp:$Rn)>;
-  def : Pat<(!cast<PatFrag>(op#"_"#size#"_acquire") GPR64sp:$Rn, SrcRHS),
+  def : Pat<(!cast<PatFrag>(op#"_"#vt#"_acquire") GPR64sp:$Rn, SrcRHS),
             (!cast<Instruction>(inst # "A" # suffix) DstRHS, GPR64sp:$Rn)>;
-  def : Pat<(!cast<PatFrag>(op#"_"#size#"_release") GPR64sp:$Rn, SrcRHS),
+  def : Pat<(!cast<PatFrag>(op#"_"#vt#"_release") GPR64sp:$Rn, SrcRHS),
             (!cast<Instruction>(inst # "L" # suffix) DstRHS, GPR64sp:$Rn)>;
-  def : Pat<(!cast<PatFrag>(op#"_"#size#"_acq_rel") GPR64sp:$Rn, SrcRHS),
+  def : Pat<(!cast<PatFrag>(op#"_"#vt#"_acq_rel") GPR64sp:$Rn, SrcRHS),
             (!cast<Instruction>(inst # "AL" # suffix) DstRHS, GPR64sp:$Rn)>;
-  def : Pat<(!cast<PatFrag>(op#"_"#size#"_seq_cst") GPR64sp:$Rn, SrcRHS),
+  def : Pat<(!cast<PatFrag>(op#"_"#vt#"_seq_cst") GPR64sp:$Rn, SrcRHS),
             (!cast<Instruction>(inst # "AL" # suffix) DstRHS, GPR64sp:$Rn)>;
 }
 
 multiclass LDOPregister_patterns_ord<string inst, string suffix, string op,
-                                     string size, dag RHS> {
-  defm : LDOPregister_patterns_ord_dag<inst, suffix, op, size, RHS, RHS>;
+                                     ValueType vt, dag RHS> {
+  defm : LDOPregister_patterns_ord_dag<inst, suffix, op, vt, RHS, RHS>;
 }
 
 multiclass LDOPregister_patterns_ord_mod<string inst, string suffix, string op,
-                                         string size, dag LHS, dag RHS> {
-  defm : LDOPregister_patterns_ord_dag<inst, suffix, op, size, LHS, RHS>;
+                                         ValueType vt, dag LHS, dag RHS> {
+  defm : LDOPregister_patterns_ord_dag<inst, suffix, op, vt, LHS, RHS>;
 }
 
 multiclass LDOPregister_patterns<string inst, string op> {
-  defm : LDOPregister_patterns_ord<inst, "X", op, "64", (i64 GPR64:$Rm)>;
-  defm : LDOPregister_patterns_ord<inst, "W", op, "32", (i32 GPR32:$Rm)>;
-  defm : LDOPregister_patterns_ord<inst, "H", op, "16", (i32 GPR32:$Rm)>;
-  defm : LDOPregister_patterns_ord<inst, "B", op, "8",  (i32 GPR32:$Rm)>;
+  defm : LDOPregister_patterns_ord<inst, "X", op, i64, (i64 GPR64:$Rm)>;
+  defm : LDOPregister_patterns_ord<inst, "W", op, i32, (i32 GPR32:$Rm)>;
+  defm : LDOPregister_patterns_ord<inst, "H", op, i16, (i32 GPR32:$Rm)>;
+  defm : LDOPregister_patterns_ord<inst, "B", op, i8,  (i32 GPR32:$Rm)>;
 }
 
 multiclass LDOPregister_patterns_mod<string inst, string op, string mod> {
-  defm : LDOPregister_patterns_ord_mod<inst, "X", op, "64",
+  defm : LDOPregister_patterns_ord_mod<inst, "X", op, i64,
                         (i64 GPR64:$Rm),
                         (i64 (!cast<Instruction>(mod#Xrr) XZR, GPR64:$Rm))>;
-  defm : LDOPregister_patterns_ord_mod<inst, "W", op, "32",
+  defm : LDOPregister_patterns_ord_mod<inst, "W", op, i32,
                         (i32 GPR32:$Rm),
                         (i32 (!cast<Instruction>(mod#Wrr) WZR, GPR32:$Rm))>;
-  defm : LDOPregister_patterns_ord_mod<inst, "H", op, "16",
+  defm : LDOPregister_patterns_ord_mod<inst, "H", op, i16,
                         (i32 GPR32:$Rm),
                         (i32 (!cast<Instruction>(mod#Wrr) WZR, GPR32:$Rm))>;
-  defm : LDOPregister_patterns_ord_mod<inst, "B", op, "8",
+  defm : LDOPregister_patterns_ord_mod<inst, "B", op, i8,
                         (i32 GPR32:$Rm),
                         (i32 (!cast<Instruction>(mod#Wrr) WZR, GPR32:$Rm))>;
 }
 
 let Predicates = [HasLSE] in
 multiclass CASregister_patterns_ord_dag<string inst, string suffix, string op,
-                                        string size, dag OLD, dag NEW> {
-  def : Pat<(!cast<PatFrag>(op#"_"#size#"_monotonic") GPR64sp:$Rn, OLD, NEW),
+                                        ValueType vt, dag OLD, dag NEW> {
+  def : Pat<(!cast<PatFrag>(op#"_"#vt#"_monotonic") GPR64sp:$Rn, OLD, NEW),
             (!cast<Instruction>(inst # suffix) OLD, NEW, GPR64sp:$Rn)>;
-  def : Pat<(!cast<PatFrag>(op#"_"#size#"_acquire") GPR64sp:$Rn, OLD, NEW),
+  def : Pat<(!cast<PatFrag>(op#"_"#vt#"_acquire") GPR64sp:$Rn, OLD, NEW),
             (!cast<Instruction>(inst # "A" # suffix) OLD, NEW, GPR64sp:$Rn)>;
-  def : Pat<(!cast<PatFrag>(op#"_"#size#"_release") GPR64sp:$Rn, OLD, NEW),
+  def : Pat<(!cast<PatFrag>(op#"_"#vt#"_release") GPR64sp:$Rn, OLD, NEW),
             (!cast<Instruction>(inst # "L" # suffix) OLD, NEW, GPR64sp:$Rn)>;
-  def : Pat<(!cast<PatFrag>(op#"_"#size#"_acq_rel") GPR64sp:$Rn, OLD, NEW),
+  def : Pat<(!cast<PatFrag>(op#"_"#vt#"_acq_rel") GPR64sp:$Rn, OLD, NEW),
             (!cast<Instruction>(inst # "AL" # suffix) OLD, NEW, GPR64sp:$Rn)>;
-  def : Pat<(!cast<PatFrag>(op#"_"#size#"_seq_cst") GPR64sp:$Rn, OLD, NEW),
+  def : Pat<(!cast<PatFrag>(op#"_"#vt#"_seq_cst") GPR64sp:$Rn, OLD, NEW),
             (!cast<Instruction>(inst # "AL" # suffix) OLD, NEW, GPR64sp:$Rn)>;
 }
 
 multiclass CASregister_patterns_ord<string inst, string suffix, string op,
-                                    string size, dag OLD, dag NEW> {
-  defm : CASregister_patterns_ord_dag<inst, suffix, op, size, OLD, NEW>;
+                                    ValueType vt, dag OLD, dag NEW> {
+  defm : CASregister_patterns_ord_dag<inst, suffix, op, vt, OLD, NEW>;
 }
 
 multiclass CASregister_patterns<string inst, string op> {
-  defm : CASregister_patterns_ord<inst, "X", op, "64",
+  defm : CASregister_patterns_ord<inst, "X", op, i64,
                         (i64 GPR64:$Rold), (i64 GPR64:$Rnew)>;
-  defm : CASregister_patterns_ord<inst, "W", op, "32",
+  defm : CASregister_patterns_ord<inst, "W", op, i32,
                         (i32 GPR32:$Rold), (i32 GPR32:$Rnew)>;
-  defm : CASregister_patterns_ord<inst, "H", op, "16",
+  defm : CASregister_patterns_ord<inst, "H", op, i16,
                         (i32 GPR32:$Rold), (i32 GPR32:$Rnew)>;
-  defm : CASregister_patterns_ord<inst, "B", op, "8",
+  defm : CASregister_patterns_ord<inst, "B", op, i8,
                         (i32 GPR32:$Rold), (i32 GPR32:$Rnew)>;
 }
 
diff --git a/llvm/lib/Target/AArch64/AArch64InstrGISel.td b/llvm/lib/Target/AArch64/AArch64InstrGISel.td
index 58ca52f37b63b..2d2b2bee99ec4 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrGISel.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrGISel.td
@@ -346,16 +346,16 @@ let Predicates = [HasNEON] in {
 }
 
 let Predicates = [HasNoLSE] in {
-def : Pat<(atomic_cmp_swap_8 GPR64:$addr, GPR32:$desired, GPR32:$new),
+def : Pat<(atomic_cmp_swap_i8 GPR64:$addr, GPR32:$desired, GPR32:$new),
           (CMP_SWAP_8 GPR64:$addr, GPR32:$desired, GPR32:$new)>;
 
-def : Pat<(atomic_cmp_swap_16 GPR64:$addr, GPR32:$desired, GPR32:$new),
+def : Pat<(atomic_cmp_swap_i16 GPR64:$addr, GPR32:$desired, GPR32:$new),
           (CMP_SWAP_16 GPR64:$addr, GPR32:$desired, GPR32:$new)>;
 
-def : Pat<(atomic_cmp_swap_32 GPR64:$addr, GPR32:$desired, GPR32:$new),
+def : Pat<(atomic_cmp_swap_i32 GPR64:$addr, GPR32:$desired, GPR32:$new),
           (CMP_SWAP_32 GPR64:$addr, GPR32:$desired, GPR32:$new)>;
 
-def : Pat<(atomic_cmp_swap_64 GPR64:$addr, GPR64:$desired, GPR64:$new),
+def : Pat<(atomic_cmp_swap_i64 GPR64:$addr, GPR64:$desired, GPR64:$new),
           (CMP_SWAP_64 GPR64:$addr, GPR64:$desired, GPR64:$new)>;
 }
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td b/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
index fa7492ac6cbe1..bd348f11007a0 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
@@ -637,9 +637,14 @@ defm int_amdgcn_atomic_cond_sub_u32 : local_addr_space_atomic_op;
 defm int_amdgcn_atomic_cond_sub_u32 : flat_addr_space_atomic_op;
 defm int_amdgcn_atomic_cond_sub_u32 : global_addr_space_atomic_op;
 
-multiclass noret_binary_atomic_op<SDNode atomic_op, bit IsInt = 1> {
+multiclass noret_binary_atomic_op<SDNode atomic_op> {
   let HasNoUse = true in
-  defm "_noret" : binary_atomic_op<atomic_op, IsInt>;
+  defm "_noret" : binary_atomic_op<atomic_op>;
+}
+
+multiclass noret_binary_atomic_op_fp<SDNode atomic_op> {
+  let HasNoUse = true in
+  defm "_noret" : binary_atomic_op_fp<atomic_op>;
 }
 
 multiclass noret_ternary_atomic_op<SDNode atomic_op> {
@@ -647,11 +652,21 @@ multiclass noret_ternary_atomic_op<SDNode atomic_op> {
   defm "_noret" : ternary_atomic_op<atomic_op>;
 }
 
-multiclass binary_atomic_op_all_as<SDNode atomic_op, bit IsInt = 1> {
-  foreach as = [ "global", "flat", "constant", "local", "private", "region" ] in {
+defvar atomic_addrspace_names = [ "global", "flat", "constant", "local", "private", "region" ];
+
+multiclass binary_atomic_op_all_as<SDNode atomic_op> {
+  foreach as = atomic_addrspace_names in {
+    let AddressSpaces = !cast<AddressSpaceList>("LoadAddress_"#as).AddrSpaces in {
+      defm "_"#as : binary_atomic_op<atomic_op>;
+      defm "_"#as : noret_binary_atomic_op<atomic_op>;
+    }
+  }
+}
+multiclass binary_atomic_op_fp_all_as<SDNode atomic_op> {
+  foreach as = atomic_addrspace_names in {
     let AddressSpaces = !cast<AddressSpaceList>("LoadAddress_"#as).AddrSpaces in {
-      defm "_"#as : binary_atomic_op<atomic_op, IsInt>;
-      defm "_"#as : noret_binary_atomic_op<atomic_op, IsInt>;
+      defm "_"#as : binary_atomic_op_fp<atomic_op>;
+      defm "_"#as : noret_binary_atomic_op_fp<atomic_op>;
     }
   }
 }
@@ -666,11 +681,11 @@ defm atomic_load_sub : binary_atomic_op_all_as<atomic_load_sub>;
 defm atomic_load_umax : binary_atomic_op_all_as<atomic_load_umax>;
 defm atomic_load_umin : binary_atomic_op_all_as<atomic_load_umin>;
 defm atomic_load_xor : binary_atomic_op_all_as<atomic_load_xor>;
-defm atomic_load_fadd : binary_atomic_op_all_as<atomic_load_fadd, 0>;
+defm atomic_load_fadd : binary_atomic_op_fp_all_as<atomic_load_fadd>;
 defm atomic_load_uinc_wrap : binary_atomic_op_all_as<atomic_load_uinc_wrap>;
 defm atomic_load_udec_wrap : binary_atomic_op_all_as<atomic_load_udec_wrap>;
 let MemoryVT = v2f16 in
-defm atomic_load_fadd_v2f16 : binary_atomic_op_all_as<atomic_load_fadd, 0>;
+defm atomic_load_fadd_v2f16 : binary_atomic_op_fp_all_as<atomic_load_fadd>;
 defm AMDGPUatomic_cmp_swap : binary_atomic_op_all_as<AMDGPUatomic_cmp_swap>;
 
 def load_align8_local : PatFrag<(ops node:$ptr), (load_local node:$ptr)>,
diff --git a/llvm/lib/Target/AMDGPU/BUFInstructions.td b/llvm/lib/Target/AMDGPU/BUFInstructions.td
index b05834e5803a2..01d649c784d70 100644
--- a/llvm/lib/Target/AMDGPU/BUFInstructions.td
+++ b/llvm/lib/Target/AMDGPU/BUFInstructions.td
@@ -1545,7 +1545,7 @@ multiclass BufferAtomicPat_Common<string OpPrefix, ValueType vt, string Inst, bi
 
   defvar Op = !cast<SDPatternOperator>(OpPrefix
                                        # !if(!eq(RtnMode, "ret"), "", "_noret")
-                                       # !if(isIntr, "", "_" # vt.Size));
+                                       # !if(isIntr, "", "_" # vt));
   defvar InstSuffix = !if(!eq(RtnMode, "ret"), "_RTN", "");
 
   let AddedComplexity = !if(!eq(RtnMode, "ret"), 0, 1) in {
@@ -1582,7 +1582,7 @@ multiclass BufferAtomicCmpSwapPat_Common<ValueType vt, ValueType data_vt, string
 
   defvar Op = !cast<SDPatternOperator>("AMDGPUatomic_cmp_swap_global"
                                        # !if(!eq(RtnMode, "ret"), "", "_noret")
-                                       # "_" # vt.Size);
+                                       # "_" # vt);
   defvar InstSuffix = !if(!eq(RtnMode, "ret"), "_RTN", "");
   defvar data_vt_RC = getVregSrcForVT<data_vt>.ret.RegClass;
 
diff --git a/llvm/lib/Target/AMDGPU/DSInstructions.td b/llvm/lib/Target/AMDGPU/DSInstructions.td
index 19bb4300531cf..b61f3d7427279 100644
--- a/llvm/lib/Target/AMDGPU/DSInstructions.td
+++ b/llvm/lib/Target/AMDGPU/DSInstructions.td
@@ -965,16 +965,16 @@ defm : DSWritePat_mc <DS_WRITE_B128, vt, "store_align_less_than_4_local">;
 
 multiclass DSAtomicRetPat_mc<DS_Pseudo inst, ValueType vt, string frag> {
   let OtherPredicates = [LDSRequiresM0Init] in {
-    def : DSAtomicRetPat<inst, vt, !cast<PatFrag>(frag#"_local_m0_"#vt.Size)>;
+    def : DSAtomicRetPat<inst, vt, !cast<PatFrag>(frag#"_local_m0_"#vt)>;
   }
 
   let OtherPredicates = [NotLDSRequiresM0Init] in {
     def : DSAtomicRetPat<!cast<DS_Pseudo>(!cast<string>(inst)#"_gfx9"), vt,
-                         !cast<PatFrag>(frag#"_local_"#vt.Size)>;
+                         !cast<PatFrag>(frag#"_local_"#vt)>;
   }
 
   let OtherPredicates = [HasGDS] in {
-    def : DSAtomicRetPat<inst, vt, !cast<PatFrag>(frag#"_region_m0_"#vt.Size),
+    def : DSAtomicRetPat<inst, vt, !cast<PatFrag>(frag#"_region_m0_"#vt),
                          /* complexity */ 0, /* gds */ 1>;
   }
 }
@@ -983,24 +983,24 @@ multiclass DSAtomicRetNoRetPat_mc<DS_Pseudo inst, DS_Pseudo noRetInst,
                                   ValueType vt, string frag> {
   let OtherPredicates = [LDSRequiresM0Init] in {
     def : DSAtomicRetPat<inst, vt,
-                         !cast<PatFrag>(frag#"_local_m0_"#vt.Size)>;
+                         !cast<PatFrag>(frag#"_local_m0_"#vt)>;
     def : DSAtomicRetPat<noRetInst, vt,
-                         !cast<PatFrag>(frag#"_local_m0_noret_"#vt.Size), /* complexity */ 1>;
+                         !cast<PatFrag>(frag#"_local_m0_noret_"#vt), /* complexity */ 1>;
   }
 
   let OtherPredicates = [NotLDSRequiresM0Init] in {
     def : DSAtomicRetPat<!cast<DS_Pseudo>(!cast<string>(inst)#"_gfx9"), vt,
-                         !cast<PatFrag>(frag#"_local_"#vt.Size)>;
+                         !cast<PatFrag>(frag#"_local_"#vt)>;
     def : DSAtomicRetPat<!cast<DS_Pseudo>(!cast<string>(noRetInst)#"_gfx9"), vt,
-                         !cast<PatFrag>(frag#"_local_noret_"#vt.Size), /* complexity */ 1>;
+                         !cast<PatFrag>(frag#"_local_noret_"#vt), /* complexity */ 1>;
   }
 
   let OtherPredicates = [HasGDS] in {
     def : DSAtomicRetPat<inst, vt,
-                         !cast<PatFrag>(frag#"_region_m0_"#vt.Size),
+                         !cast<PatFrag>(frag#"_region_m0_"#vt),
                          /* complexity */ 0, /* gds */ 1>;
     def : DSAtomicRetPat<noRetInst, vt,
-                         !cast<PatFrag>(frag#"_region_m0_noret_"#vt.Size),
+                         !cast<PatFrag>(frag#"_region_m0_noret_"#vt),
                          /* complexity */ 1, /* gds */ 1>;
   }
 }
@@ -1019,23 +1019,23 @@ class DSAtomicCmpXChgSwapped<DS_Pseudo inst, ValueType vt, PatFrag frag,
 multiclass DSAtomicCmpXChgSwapped_mc<DS_Pseudo inst, DS_Pseudo noRetInst, ValueType vt,
                                      string frag> {
   let OtherPredicates = [LDSRequiresM0Init] in {
-    def : DSAtomicCmpXChgSwapped<inst, vt, !cast<PatFrag>(frag#"_local_m0_"#vt.Size)>;
-    def : DSAtomicCmpXChgSwapped<noRetInst, vt, !cast<PatFrag>(frag#"_local_m0_noret_"#vt.Size),
+    def : DSAtomicCmpXChgSwapped<inst, vt, !cast<PatFrag>(frag#"_local_m0_"#vt)>;
+    def : DSAtomicCmpXChgSwapped<noRetInst, vt, !cast<PatFrag>(frag#"_local_m0_noret_"#vt),
                                  /* complexity */ 1>;
   }
 
   let OtherPredicates = [NotLDSRequiresM0Init] in {
     def : DSAtomicCmpXChgSwapped<!cast<DS_Pseudo>(!cast<string>(inst)#"_gfx9"), vt,
-                                 !cast<PatFrag>(frag#"_local_"#vt.Size)>;
+                                 !cast<PatFrag>(frag#"_local_"#vt)>;
     def : DSAtomicCmpXChgSwapped<!cast<DS_Pseudo>(!cast<string>(noRetInst)#"_gfx9"), vt,
-                                 !cast<PatFrag>(frag#"_local_noret_"#vt.Size),
+                                 !cast<PatFrag>(frag#"_local_noret_"#vt),
                                  /* complexity */ 1>;
   }
 
   let OtherPredicates = [HasGDS] in {
-    def : DSAtomicCmpXChgSwapped<inst, vt, !cast<PatFrag>(frag#"_region_m0_"#vt.Size),
+    def : DSAtomicCmpXChgSwapped<inst, vt, !cast<PatFrag>(frag#"_region_m0_"#vt),
                                  /* complexity */ 0, /* gds */ 1>;
-    def : DSAtomicCmpXChgSwapped<noRetInst, vt, !cast<PatFrag>(frag#"_region_m0_noret_"#vt.Size),
+    def : DSAtomicCmpXChgSwapped<noRetInst, vt, !cast<PatFrag>(frag#"_region_m0_noret_"#vt),
                                  /* complexity */ 1, /* gds */ 1>;
   }
 }
@@ -1053,14 +1053,14 @@ class DSAtomicCmpXChg<DS_Pseudo inst, ValueType vt, PatFrag frag,
 multiclass DSAtomicCmpXChg_mc<DS_Pseudo inst, DS_Pseudo noRetInst, ValueType vt, string frag> {
 
   def : DSAtomicCmpXChg<!cast<DS_Pseudo>(!cast<string>(inst)#"_gfx9"), vt,
-                        !cast<PatFrag>(frag#"_local_"#vt.Size)>;
+                        !cast<PatFrag>(frag#"_local_"#vt)>;
   def : DSAtomicCmpXChg<!cast<DS_Pseudo>(!cast<st...
[truncated]

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x86 LGTM (I've trimmed out the trailing whitespace in 46d94bd)

nekoshirro pushed a commit to nekoshirro/Alchemist-LLVM that referenced this pull request Jun 9, 2024
Signed-off-by: Hafidz Muzakky <ais.muzakky@gmail.com>
Copy link
Member

@ritter-x2a ritter-x2a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two occurrences of swap in the sed command don't match with the definitions in the code (here). In AMDGPU, I didn't see a case where that was a problem, but it might be elsewhere.
Other than that and my inline comment, AMDGPU looks fine.

llvm/lib/Target/AMDGPU/AMDGPUInstructions.td Outdated Show resolved Hide resolved
@arsenm
Copy link
Contributor Author

arsenm commented Jun 10, 2024

The two occurrences of swap in the sed command don't match with the definitions in the code ([here]

This wasn't purely that sed command, just a sample

For FP atomics involving bfloat vs. half, we need to distinguish the
type and not rely on the bitwidth alone. For my purposes, an alternative
would be to relax the atomic predicate MemoryVT pattern check with a memory
size only check. Since there are no extending operations involved,
the pattern value check should be unambiguous.

For some reason when using the _32 variants for atomicrmw fadd, I was able to
select v2f16 but v2bf16 would fail.

Changes mostly done with sed, e.g.
sed -E -i -r 's/atomic_load_(add|swap|sub|and|clr|or|xor|nand|min|max|umin|umax|swap)_([0-9]+)/atomic_load_\1_i\2/' llvm/lib/Target/*/*.td
RKSimon added a commit that referenced this pull request Jun 13, 2024
EthanLuisMcDonough pushed a commit to EthanLuisMcDonough/llvm-project that referenced this pull request Aug 13, 2024
EthanLuisMcDonough pushed a commit to EthanLuisMcDonough/llvm-project that referenced this pull request Aug 13, 2024
joviliast added a commit to joviliast/llvm-project that referenced this pull request Sep 27, 2024
As far as AMDGPU target supports vectorization for atomic_rmw operation,
allow construction of LLVM_AtomicRMWOp with 16 bit floating point values.

See also: llvm#94845, llvm#95393, llvm#95394

Signed-off-by: Ilya Veselov <iveselov.nn@gmail.com>
joviliast added a commit to joviliast/llvm-project that referenced this pull request Sep 27, 2024
As far as AMDGPU target supports vectorization for atomic_rmw operation,
allow construction of LLVM_AtomicRMWOp with 16 bit floating point values.

See also: llvm#94845, llvm#95393, llvm#95394

Signed-off-by: Ilya Veselov <iveselov.nn@gmail.com>
joviliast added a commit to joviliast/llvm-project that referenced this pull request Sep 27, 2024
As far as AMDGPU target supports vectorization for atomic_rmw operation,
allow construction of LLVM_AtomicRMWOp with 16 bit floating point values.

See also: llvm#94845, llvm#95393, llvm#95394

Signed-off-by: Ilya Veselov <iveselov.nn@gmail.com>
joviliast added a commit to joviliast/llvm-project that referenced this pull request Sep 27, 2024
As far as AMDGPU target supports vectorization for atomic_rmw operation,
allow construction of LLVM_AtomicRMWOp with 16 bit floating point values.

See also: llvm#94845, llvm#95393, llvm#95394

Signed-off-by: Ilya Veselov <iveselov.nn@gmail.com>
joviliast added a commit to joviliast/llvm-project that referenced this pull request Sep 30, 2024
As far as AMDGPU target supports vectorization for atomic_rmw operation,
allow construction of LLVM_AtomicRMWOp with 16 bit floating point values.

See also: llvm#94845, llvm#95393, llvm#95394

Signed-off-by: Ilya Veselov <iveselov.nn@gmail.com>
joviliast added a commit to joviliast/llvm-project that referenced this pull request Sep 30, 2024
As far as AMDGPU target supports vectorization for atomic_rmw operation,
allow construction of LLVM_AtomicRMWOp with 16 bit floating point values.

See also: llvm#94845, llvm#95393, llvm#95394

Signed-off-by: Ilya Veselov <iveselov.nn@gmail.com>
joviliast added a commit to joviliast/llvm-project that referenced this pull request Sep 30, 2024
As far as AMDGPU target supports vectorization for atomic_rmw operation,
allow construction of LLVM_AtomicRMWOp with 16 bit floating point values.

See also: llvm#94845, llvm#95393, llvm#95394

Signed-off-by: Ilya Veselov <iveselov.nn@gmail.com>
joviliast added a commit to joviliast/llvm-project that referenced this pull request Sep 30, 2024
As far as AMDGPU target supports vectorization for atomic_rmw operation,
allow construction of LLVM_AtomicRMWOp with 16 bit floating point values.

See also: llvm#94845, llvm#95393, llvm#95394

Signed-off-by: Ilya Veselov <iveselov.nn@gmail.com>
joviliast added a commit to joviliast/llvm-project that referenced this pull request Sep 30, 2024
As far as AMDGPU target supports vectorization for atomic_rmw operation,
allow construction of LLVM_AtomicRMWOp with 16 bit floating point values.

See also: llvm#94845, llvm#95393, llvm#95394

Signed-off-by: Ilya Veselov <iveselov.nn@gmail.com>
joviliast added a commit to joviliast/llvm-project that referenced this pull request Sep 30, 2024
As far as AMDGPU target supports vectorization for atomic_rmw operation,
allow construction of LLVM_AtomicRMWOp with 16 bit floating point values.

See also: llvm#94845, llvm#95393, llvm#95394

Signed-off-by: Ilya Veselov <iveselov.nn@gmail.com>
joviliast added a commit to joviliast/llvm-project that referenced this pull request Sep 30, 2024
As far as AMDGPU target supports vectorization for atomic_rmw operation,
allow construction of LLVM_AtomicRMWOp with 16 bit floating point values.

See also: llvm#94845, llvm#95393, llvm#95394

Signed-off-by: Ilya Veselov <iveselov.nn@gmail.com>
joviliast added a commit to joviliast/llvm-project that referenced this pull request Sep 30, 2024
As far as AMDGPU target supports vectorization for atomic_rmw operation,
allow construction of LLVM_AtomicRMWOp with 16 bit floating point values.

See also: llvm#94845, llvm#95393, llvm#95394

Signed-off-by: Ilya Veselov <iveselov.nn@gmail.com>
joviliast added a commit to joviliast/llvm-project that referenced this pull request Sep 30, 2024
As far as AMDGPU target supports vectorization for atomic_rmw operation,
allow construction of LLVM_AtomicRMWOp with 16 bit floating point values.

See also: llvm#94845, llvm#95393, llvm#95394

Signed-off-by: Ilya Veselov <iveselov.nn@gmail.com>
joviliast added a commit to joviliast/llvm-project that referenced this pull request Sep 30, 2024
As far as AMDGPU target supports vectorization for atomic_rmw operation,
allow construction of LLVM_AtomicRMWOp with 16 bit floating point values.

See also: llvm#94845, llvm#95393, llvm#95394

Signed-off-by: Ilya Veselov <iveselov.nn@gmail.com>
joviliast added a commit to joviliast/llvm-project that referenced this pull request Sep 30, 2024
As far as AMDGPU target supports vectorization for atomic_rmw operation,
allow construction of LLVM_AtomicRMWOp with 16 bit floating point values.

See also: llvm#94845, llvm#95393, llvm#95394

Signed-off-by: Ilya Veselov <iveselov.nn@gmail.com>
joviliast added a commit to joviliast/llvm-project that referenced this pull request Sep 30, 2024
As far as AMDGPU target supports vectorization for atomic_rmw operation,
allow construction of LLVM_AtomicRMWOp with 16 bit floating point values.

See also: llvm#94845, llvm#95393, llvm#95394

Signed-off-by: Ilya Veselov <iveselov.nn@gmail.com>
joviliast added a commit to joviliast/llvm-project that referenced this pull request Sep 30, 2024
As far as AMDGPU target supports vectorization for atomic_rmw operation,
allow construction of LLVM_AtomicRMWOp with 16 bit floating point values.
This patch enables building of LLVM_AtomicRMWOp with fixed vectors of
16 bit fp values as operands.

See also: llvm#94845, llvm#95393, llvm#95394

Signed-off-by: Ilya Veselov <iveselov.nn@gmail.com>
joviliast added a commit to joviliast/llvm-project that referenced this pull request Oct 1, 2024
As far as AMDGPU target supports vectorization for atomic_rmw operation,
allow construction of LLVM_AtomicRMWOp with 16 bit floating point values.
This patch enables building of LLVM_AtomicRMWOp with fixed vectors of
16 bit fp values as operands.

See also: llvm#94845, llvm#95393, llvm#95394

Signed-off-by: Ilya Veselov <iveselov.nn@gmail.com>
joviliast added a commit to joviliast/llvm-project that referenced this pull request Oct 1, 2024
As far as AMDGPU target supports vectorization for atomic_rmw operation,
allow construction of LLVM_AtomicRMWOp with 16 bit floating point values.
This patch enables building of LLVM_AtomicRMWOp with fixed vectors of
16 bit fp values as operands.

See also: llvm#94845, llvm#95393, llvm#95394

Signed-off-by: Ilya Veselov <iveselov.nn@gmail.com>
joviliast added a commit to joviliast/llvm-project that referenced this pull request Oct 2, 2024
As far as AMDGPU target supports vectorization for atomic_rmw operation,
allow construction of LLVM_AtomicRMWOp with 16 bit floating point values.
This patch enables building of LLVM_AtomicRMWOp with fixed vectors of
16 bit fp values as operands.

See also: llvm#94845, llvm#95393, llvm#95394

Signed-off-by: Ilya Veselov <iveselov.nn@gmail.com>
joviliast added a commit to joviliast/llvm-project that referenced this pull request Oct 2, 2024
As far as AMDGPU target supports vectorization for atomic_rmw operation,
allow construction of LLVM_AtomicRMWOp with 16 bit floating point values.
This patch enables building of LLVM_AtomicRMWOp with fixed vectors of
16 bit fp values as operands.

See also: llvm#94845, llvm#95393, llvm#95394

Signed-off-by: Ilya Veselov <iveselov.nn@gmail.com>
joviliast added a commit to joviliast/llvm-project that referenced this pull request Oct 2, 2024
As far as AMDGPU target supports vectorization for `atomic_rmw fadd` operation,
enable building of `LLVM_AtomicRMWOp fadd` with fixed vectors of 16 bit fp values
as operands.

See also: llvm#94845, llvm#95393, llvm#95394

Signed-off-by: Ilya Veselov <iveselov.nn@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants