Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang][SPIR-V] Add support for AMDGCN flavoured SPIRV #89796

Merged
merged 37 commits into from
Jun 7, 2024

Conversation

AlexVlx
Copy link
Contributor

@AlexVlx AlexVlx commented Apr 23, 2024

This change seeks to add support for vendor flavoured SPIRV - more specifically, AMDGCN flavoured SPIRV. The aim is to generate SPIRV that carries some extra bits of information that are only usable by AMDGCN targets, forfeiting absolute genericity to obtain greater expressiveness for target features:

  • AMDGCN inline ASM is allowed/supported, under the assumption that the SPV_INTEL_inline_assembly extension is enabled/used
  • AMDGCN target specific builtins are allowed/supported, under the assumption that e.g. the --spirv-allow-unknown-intrinsics option is enabled when using the downstream translator
  • the featureset matches the union of AMDGCN targets' features
  • the datalayout string is overspecified to affix both the program address space and the alloca address space, the latter under the assumption that the SPV_INTEL_function_pointers extension is enabled/used, case in which the extant SPIRV datalayout string would lead to pointers to function pointing to the private address space, which would be wrong.

Existing AMDGCN tests are extended to cover this new target. It is currently dormant / will require some additional changes, but I thought I'd rather put it up for review to get feedback as early as possible. I will note that an alternative option is to place this under AMDGPU, but that seems slightly less natural, since this is still SPIRV, albeit relaxed in terms of preconditions & constrained in terms of postconditions, and only guaranteed to be usable on AMDGCN targets (it is still possible to obtain pristine portable SPIRV through usage of the flavoured target, though).

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AMDGPU clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen labels Apr 23, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Apr 23, 2024

@llvm/pr-subscribers-backend-spir-v
@llvm/pr-subscribers-clang
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-backend-amdgpu

Author: Alex Voicu (AlexVlx)

Changes

This change seeks to add support for vendor flavoured SPIRV - more specifically, AMDGCN flavoured SPIRV. The aim is to generate SPIRV that carries some extra bits of information that are only usable by AMDGCN targets, forfeiting absolute genericity to obtain greater expressiveness for target features:

  • AMDGCN inline ASM is allowed/supported, under the assumption that the SPV_INTEL_inline_assembly extension is enabled/used
  • AMDGCN target specific builtins are allowed/supported, under the assumption that e.g. the --spirv-allow-unknown-intrinsics option is enabled when using the downstream translator
  • the featureset matches the union of AMDGCN targets' features
  • the datalayout string is overspecified to affix both the program address space and the alloca address space, the latter under the assumption that the SPV_INTEL_function_pointers extension is enabled/used, case in which the extant SPIRV datalayout string would lead to pointers to function pointing to the private address space, which would be wrong.

Existing AMDGCN tests are extended to cover this new target. It is currently dormant / will require some additional changes, but I thought I'd rather put it up for review to get feedback as early as possible. I will note that an alternative option is to place this under AMDGPU, but that seems slightly less natural, since this is still SPIRV, albeit relaxed in terms of preconditions & constrained in terms of postconditions, and only guaranteed to be usable on AMDGCN targets (it is still possible to obtain pristine portable SPIRV through usage of the flavoured target, though).


Patch is 73.55 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/89796.diff

22 Files Affected:

  • (modified) clang/lib/Basic/Targets.cpp (+5-1)
  • (modified) clang/lib/Basic/Targets/SPIR.cpp (+288)
  • (modified) clang/lib/Basic/Targets/SPIR.h (+51)
  • (modified) clang/lib/CodeGen/CGBuiltin.cpp (+7)
  • (modified) clang/test/CodeGen/target-data.c (+4)
  • (added) clang/test/CodeGenCUDA/builtins-spirv-amdgcn.cu (+294)
  • (added) clang/test/CodeGenCUDA/builtins-unsafe-atomics-spirv-amdgcn-gfx90a.cu (+31)
  • (modified) clang/test/CodeGenCUDA/long-double.cu (+4)
  • (added) clang/test/CodeGenCUDA/spirv-amdgcn-bf16.cu (+129)
  • (added) clang/test/CodeGenCXX/spirv-amdgcn-float16.cpp (+38)
  • (added) clang/test/CodeGenHIP/spirv-amdgcn-ballot.cpp (+27)
  • (added) clang/test/CodeGenHIP/spirv-amdgcn-dpp-const-fold.hip (+46)
  • (added) clang/test/CodeGenHIP/spirv-amdgcn-half.hip (+15)
  • (modified) clang/test/Preprocessor/predefined-macros-no-warnings.c (+1)
  • (modified) clang/test/Preprocessor/predefined-macros.c (+10)
  • (added) clang/test/Sema/builtin-spirv-amdgcn-atomic-inc-dec-failure.cpp (+25)
  • (added) clang/test/Sema/inline-asm-validate-spirv-amdgcn.cl (+111)
  • (modified) clang/test/SemaCUDA/allow-int128.cu (+3)
  • (modified) clang/test/SemaCUDA/amdgpu-f128.cu (+1)
  • (modified) clang/test/SemaCUDA/float16.cu (+1)
  • (modified) clang/test/SemaCUDA/fp16-arg-return.cu (+1)
  • (added) clang/test/SemaCUDA/spirv-amdgcn-atomic-ops.cu (+86)
diff --git a/clang/lib/Basic/Targets.cpp b/clang/lib/Basic/Targets.cpp
index e3283510c6aac7..04a13e3385d1f6 100644
--- a/clang/lib/Basic/Targets.cpp
+++ b/clang/lib/Basic/Targets.cpp
@@ -673,8 +673,12 @@ std::unique_ptr<TargetInfo> AllocateTarget(const llvm::Triple &Triple,
   }
   case llvm::Triple::spirv64: {
     if (os != llvm::Triple::UnknownOS ||
-        Triple.getEnvironment() != llvm::Triple::UnknownEnvironment)
+        Triple.getEnvironment() != llvm::Triple::UnknownEnvironment) {
+      if (os == llvm::Triple::OSType::AMDHSA)
+        return std::make_unique<SPIRV64AMDGCNTargetInfo>(Triple, Opts);
+
       return nullptr;
+    }
     return std::make_unique<SPIRV64TargetInfo>(Triple, Opts);
   }
   case llvm::Triple::wasm32:
diff --git a/clang/lib/Basic/Targets/SPIR.cpp b/clang/lib/Basic/Targets/SPIR.cpp
index dc920177d3a910..d7d232ac9484f8 100644
--- a/clang/lib/Basic/Targets/SPIR.cpp
+++ b/clang/lib/Basic/Targets/SPIR.cpp
@@ -12,6 +12,8 @@
 
 #include "SPIR.h"
 #include "Targets.h"
+#include "clang/Basic/Builtins.h"
+#include "clang/Basic/TargetBuiltins.h"
 
 using namespace clang;
 using namespace clang::targets;
@@ -54,3 +56,289 @@ void SPIRV64TargetInfo::getTargetDefines(const LangOptions &Opts,
   BaseSPIRVTargetInfo::getTargetDefines(Opts, Builder);
   DefineStd(Builder, "SPIRV64", Opts);
 }
+
+static constexpr Builtin::Info BuiltinInfo[] = {
+#define BUILTIN(ID, TYPE, ATTRS)                                               \
+  {#ID, TYPE, ATTRS, nullptr, HeaderDesc::NO_HEADER, ALL_LANGUAGES},
+#define TARGET_BUILTIN(ID, TYPE, ATTRS, FEATURE)                               \
+  {#ID, TYPE, ATTRS, FEATURE, HeaderDesc::NO_HEADER, ALL_LANGUAGES},
+#include "clang/Basic/BuiltinsAMDGPU.def"
+};
+
+namespace {
+const char *AMDGPUGCCRegNames[] = {
+  "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8",
+  "v9", "v10", "v11", "v12", "v13", "v14", "v15", "v16", "v17",
+  "v18", "v19", "v20", "v21", "v22", "v23", "v24", "v25", "v26",
+  "v27", "v28", "v29", "v30", "v31", "v32", "v33", "v34", "v35",
+  "v36", "v37", "v38", "v39", "v40", "v41", "v42", "v43", "v44",
+  "v45", "v46", "v47", "v48", "v49", "v50", "v51", "v52", "v53",
+  "v54", "v55", "v56", "v57", "v58", "v59", "v60", "v61", "v62",
+  "v63", "v64", "v65", "v66", "v67", "v68", "v69", "v70", "v71",
+  "v72", "v73", "v74", "v75", "v76", "v77", "v78", "v79", "v80",
+  "v81", "v82", "v83", "v84", "v85", "v86", "v87", "v88", "v89",
+  "v90", "v91", "v92", "v93", "v94", "v95", "v96", "v97", "v98",
+  "v99", "v100", "v101", "v102", "v103", "v104", "v105", "v106", "v107",
+  "v108", "v109", "v110", "v111", "v112", "v113", "v114", "v115", "v116",
+  "v117", "v118", "v119", "v120", "v121", "v122", "v123", "v124", "v125",
+  "v126", "v127", "v128", "v129", "v130", "v131", "v132", "v133", "v134",
+  "v135", "v136", "v137", "v138", "v139", "v140", "v141", "v142", "v143",
+  "v144", "v145", "v146", "v147", "v148", "v149", "v150", "v151", "v152",
+  "v153", "v154", "v155", "v156", "v157", "v158", "v159", "v160", "v161",
+  "v162", "v163", "v164", "v165", "v166", "v167", "v168", "v169", "v170",
+  "v171", "v172", "v173", "v174", "v175", "v176", "v177", "v178", "v179",
+  "v180", "v181", "v182", "v183", "v184", "v185", "v186", "v187", "v188",
+  "v189", "v190", "v191", "v192", "v193", "v194", "v195", "v196", "v197",
+  "v198", "v199", "v200", "v201", "v202", "v203", "v204", "v205", "v206",
+  "v207", "v208", "v209", "v210", "v211", "v212", "v213", "v214", "v215",
+  "v216", "v217", "v218", "v219", "v220", "v221", "v222", "v223", "v224",
+  "v225", "v226", "v227", "v228", "v229", "v230", "v231", "v232", "v233",
+  "v234", "v235", "v236", "v237", "v238", "v239", "v240", "v241", "v242",
+  "v243", "v244", "v245", "v246", "v247", "v248", "v249", "v250", "v251",
+  "v252", "v253", "v254", "v255", "s0", "s1", "s2", "s3", "s4",
+  "s5", "s6", "s7", "s8", "s9", "s10", "s11", "s12", "s13",
+  "s14", "s15", "s16", "s17", "s18", "s19", "s20", "s21", "s22",
+  "s23", "s24", "s25", "s26", "s27", "s28", "s29", "s30", "s31",
+  "s32", "s33", "s34", "s35", "s36", "s37", "s38", "s39", "s40",
+  "s41", "s42", "s43", "s44", "s45", "s46", "s47", "s48", "s49",
+  "s50", "s51", "s52", "s53", "s54", "s55", "s56", "s57", "s58",
+  "s59", "s60", "s61", "s62", "s63", "s64", "s65", "s66", "s67",
+  "s68", "s69", "s70", "s71", "s72", "s73", "s74", "s75", "s76",
+  "s77", "s78", "s79", "s80", "s81", "s82", "s83", "s84", "s85",
+  "s86", "s87", "s88", "s89", "s90", "s91", "s92", "s93", "s94",
+  "s95", "s96", "s97", "s98", "s99", "s100", "s101", "s102", "s103",
+  "s104", "s105", "s106", "s107", "s108", "s109", "s110", "s111", "s112",
+  "s113", "s114", "s115", "s116", "s117", "s118", "s119", "s120", "s121",
+  "s122", "s123", "s124", "s125", "s126", "s127", "exec", "vcc", "scc",
+  "m0", "flat_scratch", "exec_lo", "exec_hi", "vcc_lo", "vcc_hi",
+  "flat_scratch_lo", "flat_scratch_hi",
+  "a0", "a1", "a2", "a3", "a4", "a5", "a6", "a7", "a8",
+  "a9", "a10", "a11", "a12", "a13", "a14", "a15", "a16", "a17",
+  "a18", "a19", "a20", "a21", "a22", "a23", "a24", "a25", "a26",
+  "a27", "a28", "a29", "a30", "a31", "a32", "a33", "a34", "a35",
+  "a36", "a37", "a38", "a39", "a40", "a41", "a42", "a43", "a44",
+  "a45", "a46", "a47", "a48", "a49", "a50", "a51", "a52", "a53",
+  "a54", "a55", "a56", "a57", "a58", "a59", "a60", "a61", "a62",
+  "a63", "a64", "a65", "a66", "a67", "a68", "a69", "a70", "a71",
+  "a72", "a73", "a74", "a75", "a76", "a77", "a78", "a79", "a80",
+  "a81", "a82", "a83", "a84", "a85", "a86", "a87", "a88", "a89",
+  "a90", "a91", "a92", "a93", "a94", "a95", "a96", "a97", "a98",
+  "a99", "a100", "a101", "a102", "a103", "a104", "a105", "a106", "a107",
+  "a108", "a109", "a110", "a111", "a112", "a113", "a114", "a115", "a116",
+  "a117", "a118", "a119", "a120", "a121", "a122", "a123", "a124", "a125",
+  "a126", "a127", "a128", "a129", "a130", "a131", "a132", "a133", "a134",
+  "a135", "a136", "a137", "a138", "a139", "a140", "a141", "a142", "a143",
+  "a144", "a145", "a146", "a147", "a148", "a149", "a150", "a151", "a152",
+  "a153", "a154", "a155", "a156", "a157", "a158", "a159", "a160", "a161",
+  "a162", "a163", "a164", "a165", "a166", "a167", "a168", "a169", "a170",
+  "a171", "a172", "a173", "a174", "a175", "a176", "a177", "a178", "a179",
+  "a180", "a181", "a182", "a183", "a184", "a185", "a186", "a187", "a188",
+  "a189", "a190", "a191", "a192", "a193", "a194", "a195", "a196", "a197",
+  "a198", "a199", "a200", "a201", "a202", "a203", "a204", "a205", "a206",
+  "a207", "a208", "a209", "a210", "a211", "a212", "a213", "a214", "a215",
+  "a216", "a217", "a218", "a219", "a220", "a221", "a222", "a223", "a224",
+  "a225", "a226", "a227", "a228", "a229", "a230", "a231", "a232", "a233",
+  "a234", "a235", "a236", "a237", "a238", "a239", "a240", "a241", "a242",
+  "a243", "a244", "a245", "a246", "a247", "a248", "a249", "a250", "a251",
+  "a252", "a253", "a254", "a255"
+};
+
+} // anonymous namespace
+
+ArrayRef<const char *> SPIRV64AMDGCNTargetInfo::getGCCRegNames() const {
+  return llvm::ArrayRef(AMDGPUGCCRegNames);
+}
+
+bool SPIRV64AMDGCNTargetInfo::initFeatureMap(
+    llvm::StringMap<bool> &Features, DiagnosticsEngine &Diags, StringRef,
+    const std::vector<std::string> &FeatureVec) const {
+  // This represents the union of all AMDGCN features.
+  Features["atomic-ds-pk-add-16-insts"] = true;
+  Features["atomic-flat-pk-add-16-insts"] = true;
+  Features["atomic-buffer-global-pk-add-f16-insts"] = true;
+  Features["atomic-global-pk-add-bf16-inst"] = true;
+  Features["atomic-fadd-rtn-insts"] = true;
+  Features["ci-insts"] = true;
+  Features["dot1-insts"] = true;
+  Features["dot2-insts"] = true;
+  Features["dot3-insts"] = true;
+  Features["dot4-insts"] = true;
+  Features["dot5-insts"] = true;
+  Features["dot7-insts"] = true;
+  Features["dot8-insts"] = true;
+  Features["dot9-insts"] = true;
+  Features["dot10-insts"] = true;
+  Features["dot11-insts"] = true;
+  Features["dl-insts"] = true;
+  Features["16-bit-insts"] = true;
+  Features["dpp"] = true;
+  Features["gfx8-insts"] = true;
+  Features["gfx9-insts"] = true;
+  Features["gfx90a-insts"] = true;
+  Features["gfx940-insts"] = true;
+  Features["gfx10-insts"] = true;
+  Features["gfx10-3-insts"] = true;
+  Features["gfx11-insts"] = true;
+  Features["gfx12-insts"] = true;
+  Features["image-insts"] = true;
+  Features["fp8-conversion-insts"] = true;
+  Features["s-memrealtime"] = true;
+  Features["s-memtime-inst"] = true;
+  Features["gws"] = true;
+  Features["fp8-insts"] = true;
+  Features["fp8-conversion-insts"] = true;
+  Features["atomic-ds-pk-add-16-insts"] = true;
+  Features["mai-insts"] = true;
+
+  return TargetInfo::initFeatureMap(Features, Diags, {}, FeatureVec);
+}
+
+bool SPIRV64AMDGCNTargetInfo::validateAsmConstraint(
+    const char *&Name, TargetInfo::ConstraintInfo &Info) const {
+  // This is a 1:1 copy of AMDGPUTargetInfo::validateAsmConstraint()
+  static const ::llvm::StringSet<> SpecialRegs({
+    "exec", "vcc", "flat_scratch", "m0", "scc", "tba", "tma",
+    "flat_scratch_lo", "flat_scratch_hi", "vcc_lo", "vcc_hi", "exec_lo",
+    "exec_hi", "tma_lo", "tma_hi", "tba_lo", "tba_hi",
+  });
+
+  switch (*Name) {
+  case 'I':
+    Info.setRequiresImmediate(-16, 64);
+    return true;
+  case 'J':
+    Info.setRequiresImmediate(-32768, 32767);
+    return true;
+  case 'A':
+  case 'B':
+  case 'C':
+    Info.setRequiresImmediate();
+    return true;
+  default:
+    break;
+  }
+
+  StringRef S(Name);
+
+  if (S == "DA" || S == "DB") {
+    Name++;
+    Info.setRequiresImmediate();
+    return true;
+  }
+
+  bool HasLeftParen = S.consume_front("{");
+  if (S.empty())
+    return false;
+  if (S.front() != 'v' && S.front() != 's' && S.front() != 'a') {
+    if (!HasLeftParen)
+      return false;
+    auto E = S.find('}');
+    if (!SpecialRegs.count(S.substr(0, E)))
+      return false;
+    S = S.drop_front(E + 1);
+    if (!S.empty())
+      return false;
+    // Found {S} where S is a special register.
+    Info.setAllowsRegister();
+    Name = S.data() - 1;
+    return true;
+  }
+  S = S.drop_front();
+  if (!HasLeftParen) {
+    if (!S.empty())
+      return false;
+    // Found s, v or a.
+    Info.setAllowsRegister();
+    Name = S.data() - 1;
+    return true;
+  }
+  bool HasLeftBracket = S.consume_front("[");
+  unsigned long long N;
+  if (S.empty() || consumeUnsignedInteger(S, 10, N))
+    return false;
+  if (S.consume_front(":")) {
+    if (!HasLeftBracket)
+      return false;
+    unsigned long long M;
+    if (consumeUnsignedInteger(S, 10, M) || N >= M)
+      return false;
+  }
+  if (HasLeftBracket) {
+    if (!S.consume_front("]"))
+      return false;
+  }
+  if (!S.consume_front("}"))
+    return false;
+  if (!S.empty())
+    return false;
+  // Found {vn}, {sn}, {an}, {v[n]}, {s[n]}, {a[n]}, {v[n:m]}, {s[n:m]}
+  // or {a[n:m]}.
+  Info.setAllowsRegister();
+  Name = S.data() - 1;
+  return true;
+}
+
+std::string
+SPIRV64AMDGCNTargetInfo::convertConstraint(const char *&Constraint) const {
+  // This is a 1:1 copy of AMDGPUTargetInfo::convertConstraint()
+  StringRef S(Constraint);
+  if (S == "DA" || S == "DB") {
+    return std::string("^") + std::string(Constraint++, 2);
+  }
+
+  const char *Begin = Constraint;
+  TargetInfo::ConstraintInfo Info("", "");
+  if (validateAsmConstraint(Constraint, Info))
+    return std::string(Begin).substr(0, Constraint - Begin + 1);
+
+  Constraint = Begin;
+  return std::string(1, *Constraint);
+}
+
+ArrayRef<Builtin::Info> SPIRV64AMDGCNTargetInfo::getTargetBuiltins() const {
+  return llvm::ArrayRef(BuiltinInfo,
+                        clang::AMDGPU::LastTSBuiltin - Builtin::FirstTSBuiltin);
+}
+
+void SPIRV64AMDGCNTargetInfo::getTargetDefines(const LangOptions &Opts,
+                                               MacroBuilder &Builder) const {
+  BaseSPIRVTargetInfo::getTargetDefines(Opts, Builder);
+  DefineStd(Builder, "SPIRV64", Opts);
+
+  Builder.defineMacro("__AMD__");
+  Builder.defineMacro("__AMDGPU__");
+  Builder.defineMacro("__AMDGCN__");
+}
+
+void SPIRV64AMDGCNTargetInfo::setAuxTarget(const TargetInfo *Aux) {
+  // This is a 1:1 copy of AMDGPUTargetInfo::setAuxTarget()
+  assert(HalfFormat == Aux->HalfFormat);
+  assert(FloatFormat == Aux->FloatFormat);
+  assert(DoubleFormat == Aux->DoubleFormat);
+
+  // On x86_64 long double is 80-bit extended precision format, which is
+  // not supported by AMDGPU. 128-bit floating point format is also not
+  // supported by AMDGPU. Therefore keep its own format for these two types.
+  auto SaveLongDoubleFormat = LongDoubleFormat;
+  auto SaveFloat128Format = Float128Format;
+  auto SaveLongDoubleWidth = LongDoubleWidth;
+  auto SaveLongDoubleAlign = LongDoubleAlign;
+  copyAuxTarget(Aux);
+  LongDoubleFormat = SaveLongDoubleFormat;
+  Float128Format = SaveFloat128Format;
+  LongDoubleWidth = SaveLongDoubleWidth;
+  LongDoubleAlign = SaveLongDoubleAlign;
+  // For certain builtin types support on the host target, claim they are
+  // supported to pass the compilation of the host code during the device-side
+  // compilation.
+  // FIXME: As the side effect, we also accept `__float128` uses in the device
+  // code. To reject these builtin types supported in the host target but not in
+  // the device target, one approach would support `device_builtin` attribute
+  // so that we could tell the device builtin types from the host ones. This
+  // also solves the different representations of the same builtin type, such
+  // as `size_t` in the MSVC environment.
+  if (Aux->hasFloat128Type()) {
+    HasFloat128 = true;
+    Float128Format = DoubleFormat;
+  }
+}
diff --git a/clang/lib/Basic/Targets/SPIR.h b/clang/lib/Basic/Targets/SPIR.h
index 44265445ff004b..6b605979c9ab1d 100644
--- a/clang/lib/Basic/Targets/SPIR.h
+++ b/clang/lib/Basic/Targets/SPIR.h
@@ -364,6 +364,57 @@ class LLVM_LIBRARY_VISIBILITY SPIRV64TargetInfo : public BaseSPIRVTargetInfo {
                         MacroBuilder &Builder) const override;
 };
 
+class LLVM_LIBRARY_VISIBILITY SPIRV64AMDGCNTargetInfo
+  : public BaseSPIRVTargetInfo {
+public:
+  SPIRV64AMDGCNTargetInfo(const llvm::Triple &Triple, const TargetOptions &Opts)
+      : BaseSPIRVTargetInfo(Triple, Opts) {
+    assert(Triple.getArch() == llvm::Triple::spirv64 &&
+           "Invalid architecture for 64-bit AMDGCN SPIR-V.");
+    assert(Triple.getVendor() == llvm::Triple::VendorType::AMD &&
+           "64-bit AMDGCN SPIR-V target must use AMD vendor");
+    assert(getTriple().getOS() == llvm::Triple::OSType::AMDHSA &&
+           "64-bit AMDGCN SPIR-V target must use AMDHSA OS");
+    assert(getTriple().getEnvironment() == llvm::Triple::UnknownEnvironment &&
+           "64-bit SPIR-V target must use unknown environment type");
+    PointerWidth = PointerAlign = 64;
+    SizeType = TargetInfo::UnsignedLong;
+    PtrDiffType = IntPtrType = TargetInfo::SignedLong;
+
+    resetDataLayout("e-i64:64-v16:16-v24:32-v32:32-v48:64-"
+                    "v96:128-v192:256-v256:256-v512:512-v1024:1024-G1-P4-A0");
+
+    BFloat16Width = BFloat16Align = 16;
+    BFloat16Format = &llvm::APFloat::BFloat();
+
+    HasLegalHalfType = true;
+    HasFloat16 = true;
+    HalfArgsAndReturns = true;
+  }
+
+  bool hasBFloat16Type() const override { return true; }
+
+  ArrayRef<const char *> getGCCRegNames() const override;
+
+  bool initFeatureMap(llvm::StringMap<bool> &Features, DiagnosticsEngine &Diags,
+                      StringRef,
+                      const std::vector<std::string> &) const override;
+
+  bool validateAsmConstraint(const char *&Name,
+                             TargetInfo::ConstraintInfo &Info) const override;
+
+  std::string convertConstraint(const char *&Constraint) const override;
+
+  ArrayRef<Builtin::Info> getTargetBuiltins() const override;
+
+  void getTargetDefines(const LangOptions &Opts,
+                        MacroBuilder &Builder) const override;
+
+  void setAuxTarget(const TargetInfo *Aux) override;
+
+  bool hasInt128Type() const override { return TargetInfo::hasInt128Type(); }
+};
+
 } // namespace targets
 } // namespace clang
 #endif // LLVM_CLANG_LIB_BASIC_TARGETS_SPIR_H
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 7e5f2edfc732cc..db64b0a436a095 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -6083,6 +6083,9 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
   StringRef Prefix =
       llvm::Triple::getArchTypePrefix(getTarget().getTriple().getArch());
   if (!Prefix.empty()) {
+    if (Prefix == "spv" &&
+        getTarget().getTriple().getOS() == llvm::Triple::OSType::AMDHSA)
+      Prefix = "amdgcn";
     IntrinsicID = Intrinsic::getIntrinsicForClangBuiltin(Prefix.data(), Name);
     // NOTE we don't need to perform a compatibility flag check here since the
     // intrinsics are declared in Builtins*.def via LANGBUILTIN which filter the
@@ -6254,6 +6257,10 @@ static Value *EmitTargetArchBuiltinExpr(CodeGenFunction *CGF,
   case llvm::Triple::riscv32:
   case llvm::Triple::riscv64:
     return CGF->EmitRISCVBuiltinExpr(BuiltinID, E, ReturnValue);
+  case llvm::Triple::spirv64:
+    if (CGF->getTarget().getTriple().getOS() != llvm::Triple::OSType::AMDHSA)
+      return nullptr;
+    return CGF->EmitAMDGPUBuiltinExpr(BuiltinID, E);
   default:
     return nullptr;
   }
diff --git a/clang/test/CodeGen/target-data.c b/clang/test/CodeGen/target-data.c
index c184f314f68f80..1d40b8fe46063d 100644
--- a/clang/test/CodeGen/target-data.c
+++ b/clang/test/CodeGen/target-data.c
@@ -268,3 +268,7 @@
 // RUN: %clang_cc1 -triple ve -o - -emit-llvm %s | \
 // RUN: FileCheck %s -check-prefix=VE
 // VE: target datalayout = "e-m:e-i64:64-n32:64-S128-v64:64:64-v128:64:64-v256:64:64-v512:64:64-v1024:64:64-v2048:64:64-v4096:64:64-v8192:64:64-v16384:64:64"
+
+// RUN: %clang_cc1 -triple spirv64-amd -o - -emit-llvm %s | \
+// RUN: FileCheck %s -check-prefix=SPIR64
+// AMDGPUSPIRV64: target datalayout = "e-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-G1-P4-A0"
diff --git a/clang/test/CodeGenCUDA/builtins-spirv-amdgcn.cu b/clang/test/CodeGenCUDA/builtins-spirv-amdgcn.cu
new file mode 100644
index 00000000000000..8dbb8c538ddc16
--- /dev/null
+++ b/clang/test/CodeGenCUDA/builtins-spirv-amdgcn.cu
@@ -0,0 +1,294 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// RUN: %clang_cc1 -triple spirv64-amd-amdhsa -x hip \
+// RUN:  -aux-triple x86_64-unknown-linux-gnu -fcuda-is-device -emit-llvm %s \
+// RUN:  -o - | FileCheck %s
+
+// RUN: %clang_cc1 -triple spirv64-amd-amdhsa -x hip \
+// RUN:  -aux-triple x86_64-pc-windows-msvc -fcuda-is-device -emit-llvm %s \
+// RUN:  -o - | FileCheck %s
+
+#include "Inputs/cuda.h"
+
+// CHECK-LABEL: @_Z16use_dispatch_ptrPi(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[OUT:%.*]] = alloca ptr addrspace(4), align 8
+// CHECK-NEXT:    [[OUT_ADDR:%.*]] = alloca ptr addrspace(4), align 8
+// CHECK-NEXT:    [[DISPATCH_PTR:%.*]] = alloca ptr addrspace(4), align 8
+// CHECK-NEXT:    [[OUT_ASCAST:%.*]] = addrspacecast ptr [[OUT]] to ptr addrspace(4)
+// CHECK-NEXT:    [[OUT_ADDR_ASCAST:%.*]] = addrspacecast ptr [[OUT_ADDR]] to ptr addrspace(4)
+// CHECK-NEXT:    [[DISPATCH_PTR_ASCAST:%.*]] = addrspacecast ptr [[DISPATCH_PTR]] to ptr addrspace(4)
+// CHECK-NEXT:    [[TMP0:%.*]] = addrspacecast ptr addrspace(1) [[OUT_COERCE:%.*]] to ptr addrspace(4)
+// CHECK-NEXT:    store ptr addrspace(4) [[TMP0]], ptr addrspace(4) [[OUT_ASCAST]], align 8
+// CHECK-NEXT:    [[OUT1:%.*]] = load ptr addrspace(4), ptr addrspace(4) [[OUT_ASCAST]], align 8
+// CHECK-NEXT:    store ptr addrspace(4) [[OUT1]], ptr addrspace(4) [[OUT_ADDR_ASCAST]], align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = call align 4 dereferenceable(64) addrspace(4) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
+// CHECK-NEXT:    store ptr addrspace(4) [[TMP1]], ptr addrspace(4) [[DISPATCH_PTR_ASCAST]], align 8
+// CHECK-NEXT:    [[TMP2:%.*]] = load ptr addrspace(4), ptr addrspace(4) [[DISPATCH_PTR_ASCAST]], align 8
+// CHECK-NEXT:    [[TMP3:%.*]] = load i32, ptr addrspace(4) [[TMP2]], align 4
+// CHECK-NEXT:    [[TMP4:%.*]] = load ptr addrspace(4), ptr addrspace(4) [[OUT_ADDR_ASCAST]], align 8
+// CHECK-NEXT:    store i32 [[TMP3]], ptr addrspace(4) [[TMP4]], align 4
+// CHECK-NEXT:    ret void
+//
+__global__ void use_dispatch_ptr(int* out) {
+  const int* dispatch_ptr = (co...
[truncated]

@AlexVlx AlexVlx added backend:SPIR-V SPIR-V SPIR-V language support and removed backend:AMDGPU labels Apr 23, 2024
Copy link

github-actions bot commented Apr 23, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Comment on lines 69 to 194
"v225", "v226", "v227", "v228", "v229", "v230", "v231", "v232", "v233",
"v234", "v235", "v236", "v237", "v238", "v239", "v240", "v241", "v242",
"v243", "v244", "v245", "v246", "v247", "v248", "v249", "v250", "v251",
"v252", "v253", "v254", "v255", "s0", "s1", "s2", "s3", "s4",
"s5", "s6", "s7", "s8", "s9", "s10", "s11", "s12", "s13",
"s14", "s15", "s16", "s17", "s18", "s19", "s20", "s21", "s22",
"s23", "s24", "s25", "s26", "s27", "s28", "s29", "s30", "s31",
"s32", "s33", "s34", "s35", "s36", "s37", "s38", "s39", "s40",
"s41", "s42", "s43", "s44", "s45", "s46", "s47", "s48", "s49",
"s50", "s51", "s52", "s53", "s54", "s55", "s56", "s57", "s58",
"s59", "s60", "s61", "s62", "s63", "s64", "s65", "s66", "s67",
"s68", "s69", "s70", "s71", "s72", "s73", "s74", "s75", "s76",
"s77", "s78", "s79", "s80", "s81", "s82", "s83", "s84", "s85",
"s86", "s87", "s88", "s89", "s90", "s91", "s92", "s93", "s94",
"s95", "s96", "s97", "s98", "s99", "s100", "s101", "s102", "s103",
"s104", "s105", "s106", "s107", "s108", "s109", "s110", "s111", "s112",
"s113", "s114", "s115", "s116", "s117", "s118", "s119", "s120", "s121",
"s122", "s123", "s124", "s125", "s126", "s127", "exec", "vcc", "scc",
"m0", "flat_scratch", "exec_lo", "exec_hi", "vcc_lo", "vcc_hi",
"flat_scratch_lo", "flat_scratch_hi",
"a0", "a1", "a2", "a3", "a4", "a5", "a6", "a7", "a8",
"a9", "a10", "a11", "a12", "a13", "a14", "a15", "a16", "a17",
"a18", "a19", "a20", "a21", "a22", "a23", "a24", "a25", "a26",
"a27", "a28", "a29", "a30", "a31", "a32", "a33", "a34", "a35",
"a36", "a37", "a38", "a39", "a40", "a41", "a42", "a43", "a44",
"a45", "a46", "a47", "a48", "a49", "a50", "a51", "a52", "a53",
"a54", "a55", "a56", "a57", "a58", "a59", "a60", "a61", "a62",
"a63", "a64", "a65", "a66", "a67", "a68", "a69", "a70", "a71",
"a72", "a73", "a74", "a75", "a76", "a77", "a78", "a79", "a80",
"a81", "a82", "a83", "a84", "a85", "a86", "a87", "a88", "a89",
"a90", "a91", "a92", "a93", "a94", "a95", "a96", "a97", "a98",
"a99", "a100", "a101", "a102", "a103", "a104", "a105", "a106", "a107",
"a108", "a109", "a110", "a111", "a112", "a113", "a114", "a115", "a116",
"a117", "a118", "a119", "a120", "a121", "a122", "a123", "a124", "a125",
"a126", "a127", "a128", "a129", "a130", "a131", "a132", "a133", "a134",
"a135", "a136", "a137", "a138", "a139", "a140", "a141", "a142", "a143",
"a144", "a145", "a146", "a147", "a148", "a149", "a150", "a151", "a152",
"a153", "a154", "a155", "a156", "a157", "a158", "a159", "a160", "a161",
"a162", "a163", "a164", "a165", "a166", "a167", "a168", "a169", "a170",
"a171", "a172", "a173", "a174", "a175", "a176", "a177", "a178", "a179",
"a180", "a181", "a182", "a183", "a184", "a185", "a186", "a187", "a188",
"a189", "a190", "a191", "a192", "a193", "a194", "a195", "a196", "a197",
"a198", "a199", "a200", "a201", "a202", "a203", "a204", "a205", "a206",
"a207", "a208", "a209", "a210", "a211", "a212", "a213", "a214", "a215",
"a216", "a217", "a218", "a219", "a220", "a221", "a222", "a223", "a224",
"a225", "a226", "a227", "a228", "a229", "a230", "a231", "a232", "a233",
"a234", "a235", "a236", "a237", "a238", "a239", "a240", "a241", "a242",
"a243", "a244", "a245", "a246", "a247", "a248", "a249", "a250", "a251",
"a252", "a253", "a254", "a255"
};

} // anonymous namespace

ArrayRef<const char *> SPIRV64AMDGCNTargetInfo::getGCCRegNames() const {
return llvm::ArrayRef(AMDGPUGCCRegNames);
}

bool SPIRV64AMDGCNTargetInfo::initFeatureMap(
llvm::StringMap<bool> &Features, DiagnosticsEngine &Diags, StringRef,
const std::vector<std::string> &FeatureVec) const {
// This represents the union of all AMDGCN features.
Features["atomic-ds-pk-add-16-insts"] = true;
Features["atomic-flat-pk-add-16-insts"] = true;
Features["atomic-buffer-global-pk-add-f16-insts"] = true;
Features["atomic-global-pk-add-bf16-inst"] = true;
Features["atomic-fadd-rtn-insts"] = true;
Features["ci-insts"] = true;
Features["dot1-insts"] = true;
Features["dot2-insts"] = true;
Features["dot3-insts"] = true;
Features["dot4-insts"] = true;
Features["dot5-insts"] = true;
Features["dot7-insts"] = true;
Features["dot8-insts"] = true;
Features["dot9-insts"] = true;
Features["dot10-insts"] = true;
Features["dot11-insts"] = true;
Features["dl-insts"] = true;
Features["16-bit-insts"] = true;
Features["dpp"] = true;
Features["gfx8-insts"] = true;
Features["gfx9-insts"] = true;
Features["gfx90a-insts"] = true;
Features["gfx940-insts"] = true;
Features["gfx10-insts"] = true;
Features["gfx10-3-insts"] = true;
Features["gfx11-insts"] = true;
Features["gfx12-insts"] = true;
Features["image-insts"] = true;
Features["fp8-conversion-insts"] = true;
Features["s-memrealtime"] = true;
Features["s-memtime-inst"] = true;
Features["gws"] = true;
Features["fp8-insts"] = true;
Features["fp8-conversion-insts"] = true;
Features["atomic-ds-pk-add-16-insts"] = true;
Features["mai-insts"] = true;

return TargetInfo::initFeatureMap(Features, Diags, {}, FeatureVec);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not want to maintain 2 copies of either of these things. Can we just move this to a common place? I thought we already moved the Features gunk into lib/TargetSupport

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a common place for the register names, the current design assumes they're a private static detail of a target - suggestions welcome though, I agree the duplication is pretty terrible. As for the features, if you're thinking about extending and then re-using AMDGPU::fillAMDGPUFeatureMap, I can do that, and it's probably a better solution I had not considered. Is that what you had in mind?

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much of this is actually different from the existing target info for AMDGCN? Seems like we're doing a lot of redundant stuff like defining macros or features.

@AlexVlx
Copy link
Contributor Author

AlexVlx commented Apr 23, 2024

How much of this is actually different from the existing target info for AMDGCN? Seems like we're doing a lot of redundant stuff like defining macros or features.

That's part of the point, it's not actually supposed to differ in those particular regards, up to the point where things fork into specific GFXIPs. At the same time, there's no feasible way to re-use any of that, at least not one that I can see with how targets currently work. If you're suggesting that this should actually be based on AMDGPUTargetInfo, that's probably not the right way to go since that sets additional things that do not work with SPIRV at all.

@jhuber6
Copy link
Contributor

jhuber6 commented Apr 23, 2024

How much of this is actually different from the existing target info for AMDGCN? Seems like we're doing a lot of redundant stuff like defining macros or features.

That's part of the point, it's not actually supposed to differ in those particular regards, up to the point where things fork into specific GFXIPs. At the same time, there's no feasible way to re-use any of that, at least not one that I can see with how targets currently work. If you're suggesting that this should actually be based on AMDGPUTargetInfo, that's probably not the right way to go since that sets additional things that do not work with SPIRV at all.

Yeah, I was unsure how much of this is a subset. We could pull the common stuff into some new base class that both targets then inherit from, but it depends how much code we actually save with that method. I think I agree at the very least we should try to avoid duplicating the register list.

@AlexVlx
Copy link
Contributor Author

AlexVlx commented Apr 23, 2024

How much of this is actually different from the existing target info for AMDGCN? Seems like we're doing a lot of redundant stuff like defining macros or features.

That's part of the point, it's not actually supposed to differ in those particular regards, up to the point where things fork into specific GFXIPs. At the same time, there's no feasible way to re-use any of that, at least not one that I can see with how targets currently work. If you're suggesting that this should actually be based on AMDGPUTargetInfo, that's probably not the right way to go since that sets additional things that do not work with SPIRV at all.

Yeah, I was unsure how much of this is a subset. We could pull the common stuff into some new base class that both targets then inherit from, but it depends how much code we actually save with that method. I think I agree at the very least we should try to avoid duplicating the register list.

That's not a bad idea but I suspect we'll run into a physical design issue since there doesn't seem to be a natural place to put the shared base - unless you were thinking about a place in particular? We'd probably have to relocate this to the AMDGCN side, and then duplicate the SPIRV bits/details, and at that point we could just re-use/share the reg defs.

@jhuber6
Copy link
Contributor

jhuber6 commented Apr 23, 2024

That's not a bad idea but I suspect we'll run into a physical design issue since there doesn't seem to be a natural place to put the shared base - unless you were thinking about a place in particular? We'd probably have to relocate this to the AMDGCN side, and then duplicate the SPIRV bits/details, and at that point we could just re-use/share the reg defs.

Depending on how AMDGCN flavored this is, we could concievably just put it in the same AMDGPU.cpp file, since realistically both are the same "machine".

};

namespace {
const char *AMDGPUGCCRegNames[] = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could create a common base class for amdgpu target info and spirv64 amdgcn target info and not duplicate stuff. we can define a static member for reg names.

@AlexVlx
Copy link
Contributor Author

AlexVlx commented May 27, 2024

@bader @asudarsa @michalpaszkowski any opinions from the SPIRV side? I would like to merge this so as to be able to progress some related work, but I'd rather not squat on SPIRV real-estate without an ACK from the landlords:)

Copy link
Contributor

@bader bader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with the patch in general, but I'd like either @michalpaszkowski or @VyacheslavLevytskyy to take a look.
Just one question about doubling the test scope for LLVM tests with spir64 target triple.

@@ -1,4 +1,5 @@
; RUN: llc -O0 -mtriple=spirv64-unknown-unknown %s -o - | FileCheck %s
; RUN: llc -O0 -mtriple=spirv64-amd-amdhsa %s -o - | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if there is a value in testing spirv64-amd-amdhsa triple in addition to spirv64-unknown-unknown for most of llvm/test/CodeGen/SPIRV tests. It makes sense to extend testing for vendor extensions mentioned in the description, but I suppose most of the tests cover core functionality which should work any spir64-* triple. Am I right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, you are right; my thinking was to prevent anyone getting ideas and diverging the "flavoured" variant in ways that breaks its core functionality as a spirv64 target, but that might be needlessly defensive. If we don't feel there's much value here, I'll revert the test spam around core functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've rolled back most of these, except for a couple of places where there is a substantive difference (spirv64-amdgcn-amdhsa encodes the program address space in its DL, which yields slightly different IR in places).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am okay with either approach. I think we can assume that the core functionality will stay the same for all "flavors". In case of any changes in the future to this assumption, we could create larger tests covering multiple features, which will not provide any isolation value but just ensure the features are still there.

@asudarsa
Copy link
Contributor

I'm okay with the patch in general, but I'd like either @michalpaszkowski or @VyacheslavLevytskyy to take a look. Just one question about doubling the test scope for LLVM tests with spir64 target triple.

+1 on this. I will take a look before end of week on this. Thanks

; CHECK: OpFunctionEnd
; CHECK: %[[DefFunFp]] = OpFunction %[[TyVoid]] None %[[TyFunFp]]

target triple = "spir64-unknown-unknown"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably this line is not needed here?

; CHECK: OpReturn
; CHECK: OpFunctionEnd

target triple = "spir64-unknown-unknown"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as above, I guess we don't need this line.

%0 = call spir_func addrspace(4) float %fp(ptr addrspace(4) %bar)
%1 = call spir_func addrspace(4) i64 %bar(ptr addrspace(4) %fp, ptr addrspace(4) %data)
ret void
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a general suggestion, if I may, about those two function pointers tests. Neither the extension itself is perfect, nor its current implementation in SPIR-V Backend. About a half of Khronos Translator test cases for the function pointers extension crash when running with our SPIR-V Backend implementation, and this is a near to-do item in our development plans.
Giving all these, probably it's better to freeze in tests as little of function pointers implementation logics as it's possible -- until the implementation is completed in full and all Khronos Translator test cases are successful. In terms of this PR this means that probably it's better to strip those 2 tests of details whenever this is possible, unless you consider each existing CHECK in those tests important.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this makes sense; initially we are planning to rely on the translator, until the BE makes it out of the experimental phase, so I think I can remove these and re-add them (if necessary) when that happens.

Copy link
Contributor

@VyacheslavLevytskyy VyacheslavLevytskyy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you

Copy link
Member

@michalpaszkowski michalpaszkowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlexVlx LGTM! Thank you!

@AlexVlx AlexVlx merged commit 88e2bb4 into llvm:main Jun 7, 2024
9 checks passed
@AlexVlx
Copy link
Contributor Author

AlexVlx commented Jun 7, 2024

Thank you everyone for the reviews!

@HerrCai0907 HerrCai0907 mentioned this pull request Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU backend:SPIR-V clang:codegen clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category SPIR-V SPIR-V language support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants