[LLVM][LTO] Factor out RTLib calls and allow them to be dropped #98512

jhuber6 · 2024-07-11T18:02:02Z

Summary:
The LTO pass and LLD linker have logic in them that forces extraction
and prevent internalization of needed runtime calls. However, these
currently take all RTLibcalls into account, even if the target does not
support them. The target opts-out of a libcall if it sets its name to
nullptr. This patch pulls this logic out into a class in the header so
that LTO / lld can use it to determine if a symbol actually needs to be
kept.

This is important for targets like AMDGPU that want to be able to use
lld to perform the final link step, but does not want the overhead of
uncalled functions. (This adds like a second to the link time trivially)

llvmbot · 2024-07-11T18:02:32Z

@llvm/pr-subscribers-backend-arm
@llvm/pr-subscribers-backend-webassembly
@llvm/pr-subscribers-llvm-selectiondag
@llvm/pr-subscribers-debuginfo
@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-lto
@llvm/pr-subscribers-lld
@llvm/pr-subscribers-lld-coff
@llvm/pr-subscribers-lld-elf
@llvm/pr-subscribers-platform-windows

@llvm/pr-subscribers-lld-wasm

Author: Joseph Huber (jhuber6)

Changes

Summary:
The LTO pass and LLD linker have logic in them that forces extraction
and prevent internalization of needed runtime calls. However, these
currently take all RTLibcalls into account, even if the target does not
support them. The target opts-out of a libcall if it sets its name to
nullptr. This patch pulls this logic out into a class in the header so
that LTO / lld can use it to determine if a symbol actually needs to be
kept.

This is important for targets like AMDGPU that want to be able to use
lld to perform the final link step, but cannot maintain the overhead
of several unused function calls.

Patch is 42.54 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/98512.diff

9 Files Affected:

(modified) lld/COFF/Driver.cpp (+5-2)
(modified) lld/ELF/Driver.cpp (+4-2)
(modified) lld/wasm/Driver.cpp (+4-2)
(modified) llvm/include/llvm/CodeGen/TargetLowering.h (+440-24)
(modified) llvm/include/llvm/LTO/LTO.h (+1-1)
(modified) llvm/lib/CodeGen/TargetLoweringBase.cpp (+2-371)
(modified) llvm/lib/LTO/LTO.cpp (+8-7)
(modified) llvm/lib/Object/IRSymtab.cpp (+13-7)
(modified) llvm/tools/lto/lto.cpp (+1-1)

diff --git a/lld/COFF/Driver.cpp b/lld/COFF/Driver.cpp
index cef6271e4c8f8..9e28b1c50be50 100644
--- a/lld/COFF/Driver.cpp
+++ b/lld/COFF/Driver.cpp
@@ -2428,9 +2428,12 @@ void LinkerDriver::linkerMain(ArrayRef<const char *> argsArr) {
       // file's symbol table. If any of those library functions are defined in a
       // bitcode file in an archive member, we need to arrange to use LTO to
       // compile those archive members by adding them to the link beforehand.
-      if (!ctx.bitcodeFileInstances.empty())
-        for (auto *s : lto::LTO::getRuntimeLibcallSymbols())
+      if (!ctx.bitcodeFileInstances.empty()) {
+        llvm::Triple TT(
+            ctx.bitcodeFileInstances.front()->obj->getTargetTriple());
+        for (auto *s : lto::LTO::getRuntimeLibcallSymbols(TT))
           ctx.symtab.addLibcall(s);
+      }
 
       // Windows specific -- if __load_config_used can be resolved, resolve it.
       if (ctx.symtab.findUnderscore("_load_config_used"))
diff --git a/lld/ELF/Driver.cpp b/lld/ELF/Driver.cpp
index abfa313bfef0e..5c7ff8dcda945 100644
--- a/lld/ELF/Driver.cpp
+++ b/lld/ELF/Driver.cpp
@@ -2883,9 +2883,11 @@ template <class ELFT> void LinkerDriver::link(opt::InputArgList &args) {
   // to, i.e. if the symbol's definition is in bitcode. Any other required
   // libcall symbols will be added to the link after LTO when we add the LTO
   // object file to the link.
-  if (!ctx.bitcodeFiles.empty())
-    for (auto *s : lto::LTO::getRuntimeLibcallSymbols())
+  if (!ctx.bitcodeFiles.empty()) {
+    llvm::Triple TT(ctx.bitcodeFiles.front()->obj->getTargetTriple());
+    for (auto *s : lto::LTO::getRuntimeLibcallSymbols(TT))
       handleLibcall(s);
+  }
 
   // Archive members defining __wrap symbols may be extracted.
   std::vector<WrappedSymbol> wrapped = addWrappedSymbols(args);
diff --git a/lld/wasm/Driver.cpp b/lld/wasm/Driver.cpp
index d099689911fc6..1d545420e182c 100644
--- a/lld/wasm/Driver.cpp
+++ b/lld/wasm/Driver.cpp
@@ -1319,9 +1319,11 @@ void LinkerDriver::linkerMain(ArrayRef<const char *> argsArr) {
   // We only need to add libcall symbols to the link before LTO if the symbol's
   // definition is in bitcode. Any other required libcall symbols will be added
   // to the link after LTO when we add the LTO object file to the link.
-  if (!ctx.bitcodeFiles.empty())
-    for (auto *s : lto::LTO::getRuntimeLibcallSymbols())
+  if (!ctx.bitcodeFiles.empty()) {
+    llvm::Triple TT(ctx.bitcodeFiles.front()->obj->getTargetTriple());
+    for (auto *s : lto::LTO::getRuntimeLibcallSymbols(TT))
       handleLibcall(s);
+  }
   if (errorCount())
     return;
 
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index 55b60b01e5827..bda55ab1402be 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -189,6 +189,436 @@ struct MemOp {
   }
 };
 
+struct LibcallsInfo {
+  explicit LibcallsInfo(const Triple &TT) {
+    initLibcalls(TT);
+    initCmpLibcallCCs();
+  }
+
+  /// Rename the default libcall routine name for the specified libcall.
+  void setLibcallName(RTLIB::Libcall Call, const char *Name) {
+    LibcallRoutineNames[Call] = Name;
+  }
+
+  void setLibcallName(ArrayRef<RTLIB::Libcall> Calls, const char *Name) {
+    for (auto Call : Calls)
+      setLibcallName(Call, Name);
+  }
+
+  /// Get the libcall routine name for the specified libcall.
+  const char *getLibcallName(RTLIB::Libcall Call) const {
+    return LibcallRoutineNames[Call];
+  }
+
+  /// Override the default CondCode to be used to test the result of the
+  /// comparison libcall against zero.
+  void setCmpLibcallCC(RTLIB::Libcall Call, ISD::CondCode CC) {
+    CmpLibcallCCs[Call] = CC;
+  }
+
+  /// Get the CondCode that's to be used to test the result of the comparison
+  /// libcall against zero.
+  ISD::CondCode getCmpLibcallCC(RTLIB::Libcall Call) const {
+    return CmpLibcallCCs[Call];
+  }
+
+  /// Set the CallingConv that should be used for the specified libcall.
+  void setLibcallCallingConv(RTLIB::Libcall Call, CallingConv::ID CC) {
+    LibcallCallingConvs[Call] = CC;
+  }
+
+  /// Get the CallingConv that should be used for the specified libcall.
+  CallingConv::ID getLibcallCallingConv(RTLIB::Libcall Call) const {
+    return LibcallCallingConvs[Call];
+  }
+
+  iterator_range<const char **> getLibcallNames() {
+    return llvm::make_range(LibcallRoutineNames,
+                            LibcallRoutineNames + RTLIB::UNKNOWN_LIBCALL);
+  }
+
+private:
+  /// Stores the name each libcall.
+  const char *LibcallRoutineNames[RTLIB::UNKNOWN_LIBCALL + 1];
+
+  /// The ISD::CondCode that should be used to test the result of each of the
+  /// comparison libcall against zero.
+  ISD::CondCode CmpLibcallCCs[RTLIB::UNKNOWN_LIBCALL];
+
+  /// Stores the CallingConv that should be used for each libcall.
+  CallingConv::ID LibcallCallingConvs[RTLIB::UNKNOWN_LIBCALL];
+
+  static bool darwinHasSinCos(const Triple &TT) {
+    assert(TT.isOSDarwin() && "should be called with darwin triple");
+    // Don't bother with 32 bit x86.
+    if (TT.getArch() == Triple::x86)
+      return false;
+    // Macos < 10.9 has no sincos_stret.
+    if (TT.isMacOSX())
+      return !TT.isMacOSXVersionLT(10, 9) && TT.isArch64Bit();
+    // iOS < 7.0 has no sincos_stret.
+    if (TT.isiOS())
+      return !TT.isOSVersionLT(7, 0);
+    // Any other darwin such as WatchOS/TvOS is new enough.
+    return true;
+  }
+
+  /// Sets default libcall calling conventions.
+  void initCmpLibcallCCs() {
+    std::fill(CmpLibcallCCs, CmpLibcallCCs + RTLIB::UNKNOWN_LIBCALL,
+              ISD::SETCC_INVALID);
+    CmpLibcallCCs[RTLIB::OEQ_F32] = ISD::SETEQ;
+    CmpLibcallCCs[RTLIB::OEQ_F64] = ISD::SETEQ;
+    CmpLibcallCCs[RTLIB::OEQ_F128] = ISD::SETEQ;
+    CmpLibcallCCs[RTLIB::OEQ_PPCF128] = ISD::SETEQ;
+    CmpLibcallCCs[RTLIB::UNE_F32] = ISD::SETNE;
+    CmpLibcallCCs[RTLIB::UNE_F64] = ISD::SETNE;
+    CmpLibcallCCs[RTLIB::UNE_F128] = ISD::SETNE;
+    CmpLibcallCCs[RTLIB::UNE_PPCF128] = ISD::SETNE;
+    CmpLibcallCCs[RTLIB::OGE_F32] = ISD::SETGE;
+    CmpLibcallCCs[RTLIB::OGE_F64] = ISD::SETGE;
+    CmpLibcallCCs[RTLIB::OGE_F128] = ISD::SETGE;
+    CmpLibcallCCs[RTLIB::OGE_PPCF128] = ISD::SETGE;
+    CmpLibcallCCs[RTLIB::OLT_F32] = ISD::SETLT;
+    CmpLibcallCCs[RTLIB::OLT_F64] = ISD::SETLT;
+    CmpLibcallCCs[RTLIB::OLT_F128] = ISD::SETLT;
+    CmpLibcallCCs[RTLIB::OLT_PPCF128] = ISD::SETLT;
+    CmpLibcallCCs[RTLIB::OLE_F32] = ISD::SETLE;
+    CmpLibcallCCs[RTLIB::OLE_F64] = ISD::SETLE;
+    CmpLibcallCCs[RTLIB::OLE_F128] = ISD::SETLE;
+    CmpLibcallCCs[RTLIB::OLE_PPCF128] = ISD::SETLE;
+    CmpLibcallCCs[RTLIB::OGT_F32] = ISD::SETGT;
+    CmpLibcallCCs[RTLIB::OGT_F64] = ISD::SETGT;
+    CmpLibcallCCs[RTLIB::OGT_F128] = ISD::SETGT;
+    CmpLibcallCCs[RTLIB::OGT_PPCF128] = ISD::SETGT;
+    CmpLibcallCCs[RTLIB::UO_F32] = ISD::SETNE;
+    CmpLibcallCCs[RTLIB::UO_F64] = ISD::SETNE;
+    CmpLibcallCCs[RTLIB::UO_F128] = ISD::SETNE;
+    CmpLibcallCCs[RTLIB::UO_PPCF128] = ISD::SETNE;
+  }
+
+  /// Set default libcall names.
+  void initLibcalls(const Triple &TT) {
+    std::fill(std::begin(LibcallRoutineNames), std::end(LibcallRoutineNames),
+              nullptr);
+
+#define HANDLE_LIBCALL(code, name) setLibcallName(RTLIB::code, name);
+#include "llvm/IR/RuntimeLibcalls.def"
+#undef HANDLE_LIBCALL
+
+    // Initialize calling conventions to their default.
+    for (int LC = 0; LC < RTLIB::UNKNOWN_LIBCALL; ++LC)
+      setLibcallCallingConv((RTLIB::Libcall)LC, CallingConv::C);
+
+    // Use the f128 variants of math functions on x86_64
+    if (TT.getArch() == Triple::ArchType::x86_64 && TT.isGNUEnvironment()) {
+      setLibcallName(RTLIB::REM_F128, "fmodf128");
+      setLibcallName(RTLIB::FMA_F128, "fmaf128");
+      setLibcallName(RTLIB::SQRT_F128, "sqrtf128");
+      setLibcallName(RTLIB::CBRT_F128, "cbrtf128");
+      setLibcallName(RTLIB::LOG_F128, "logf128");
+      setLibcallName(RTLIB::LOG_FINITE_F128, "__logf128_finite");
+      setLibcallName(RTLIB::LOG2_F128, "log2f128");
+      setLibcallName(RTLIB::LOG2_FINITE_F128, "__log2f128_finite");
+      setLibcallName(RTLIB::LOG10_F128, "log10f128");
+      setLibcallName(RTLIB::LOG10_FINITE_F128, "__log10f128_finite");
+      setLibcallName(RTLIB::EXP_F128, "expf128");
+      setLibcallName(RTLIB::EXP_FINITE_F128, "__expf128_finite");
+      setLibcallName(RTLIB::EXP2_F128, "exp2f128");
+      setLibcallName(RTLIB::EXP2_FINITE_F128, "__exp2f128_finite");
+      setLibcallName(RTLIB::EXP10_F128, "exp10f128");
+      setLibcallName(RTLIB::SIN_F128, "sinf128");
+      setLibcallName(RTLIB::COS_F128, "cosf128");
+      setLibcallName(RTLIB::TAN_F128, "tanf128");
+      setLibcallName(RTLIB::SINCOS_F128, "sincosf128");
+      setLibcallName(RTLIB::POW_F128, "powf128");
+      setLibcallName(RTLIB::POW_FINITE_F128, "__powf128_finite");
+      setLibcallName(RTLIB::CEIL_F128, "ceilf128");
+      setLibcallName(RTLIB::TRUNC_F128, "truncf128");
+      setLibcallName(RTLIB::RINT_F128, "rintf128");
+      setLibcallName(RTLIB::NEARBYINT_F128, "nearbyintf128");
+      setLibcallName(RTLIB::ROUND_F128, "roundf128");
+      setLibcallName(RTLIB::ROUNDEVEN_F128, "roundevenf128");
+      setLibcallName(RTLIB::FLOOR_F128, "floorf128");
+      setLibcallName(RTLIB::COPYSIGN_F128, "copysignf128");
+      setLibcallName(RTLIB::FMIN_F128, "fminf128");
+      setLibcallName(RTLIB::FMAX_F128, "fmaxf128");
+      setLibcallName(RTLIB::LROUND_F128, "lroundf128");
+      setLibcallName(RTLIB::LLROUND_F128, "llroundf128");
+      setLibcallName(RTLIB::LRINT_F128, "lrintf128");
+      setLibcallName(RTLIB::LLRINT_F128, "llrintf128");
+      setLibcallName(RTLIB::LDEXP_F128, "ldexpf128");
+      setLibcallName(RTLIB::FREXP_F128, "frexpf128");
+    }
+
+    // For IEEE quad-precision libcall names, PPC uses "kf" instead of "tf".
+    if (TT.isPPC()) {
+      setLibcallName(RTLIB::ADD_F128, "__addkf3");
+      setLibcallName(RTLIB::SUB_F128, "__subkf3");
+      setLibcallName(RTLIB::MUL_F128, "__mulkf3");
+      setLibcallName(RTLIB::DIV_F128, "__divkf3");
+      setLibcallName(RTLIB::POWI_F128, "__powikf2");
+      setLibcallName(RTLIB::FPEXT_F32_F128, "__extendsfkf2");
+      setLibcallName(RTLIB::FPEXT_F64_F128, "__extenddfkf2");
+      setLibcallName(RTLIB::FPROUND_F128_F32, "__trunckfsf2");
+      setLibcallName(RTLIB::FPROUND_F128_F64, "__trunckfdf2");
+      setLibcallName(RTLIB::FPTOSINT_F128_I32, "__fixkfsi");
+      setLibcallName(RTLIB::FPTOSINT_F128_I64, "__fixkfdi");
+      setLibcallName(RTLIB::FPTOSINT_F128_I128, "__fixkfti");
+      setLibcallName(RTLIB::FPTOUINT_F128_I32, "__fixunskfsi");
+      setLibcallName(RTLIB::FPTOUINT_F128_I64, "__fixunskfdi");
+      setLibcallName(RTLIB::FPTOUINT_F128_I128, "__fixunskfti");
+      setLibcallName(RTLIB::SINTTOFP_I32_F128, "__floatsikf");
+      setLibcallName(RTLIB::SINTTOFP_I64_F128, "__floatdikf");
+      setLibcallName(RTLIB::SINTTOFP_I128_F128, "__floattikf");
+      setLibcallName(RTLIB::UINTTOFP_I32_F128, "__floatunsikf");
+      setLibcallName(RTLIB::UINTTOFP_I64_F128, "__floatundikf");
+      setLibcallName(RTLIB::UINTTOFP_I128_F128, "__floatuntikf");
+      setLibcallName(RTLIB::OEQ_F128, "__eqkf2");
+      setLibcallName(RTLIB::UNE_F128, "__nekf2");
+      setLibcallName(RTLIB::OGE_F128, "__gekf2");
+      setLibcallName(RTLIB::OLT_F128, "__ltkf2");
+      setLibcallName(RTLIB::OLE_F128, "__lekf2");
+      setLibcallName(RTLIB::OGT_F128, "__gtkf2");
+      setLibcallName(RTLIB::UO_F128, "__unordkf2");
+    }
+
+    // A few names are different on particular architectures or environments.
+    if (TT.isOSDarwin()) {
+      // For f16/f32 conversions, Darwin uses the standard naming scheme,
+      // instead of the gnueabi-style __gnu_*_ieee.
+      // FIXME: What about other targets?
+      setLibcallName(RTLIB::FPEXT_F16_F32, "__extendhfsf2");
+      setLibcallName(RTLIB::FPROUND_F32_F16, "__truncsfhf2");
+
+      // Some darwins have an optimized __bzero/bzero function.
+      switch (TT.getArch()) {
+      case Triple::x86:
+      case Triple::x86_64:
+        if (TT.isMacOSX() && !TT.isMacOSXVersionLT(10, 6))
+          setLibcallName(RTLIB::BZERO, "__bzero");
+        break;
+      case Triple::aarch64:
+      case Triple::aarch64_32:
+        setLibcallName(RTLIB::BZERO, "bzero");
+        break;
+      default:
+        break;
+      }
+
+      if (darwinHasSinCos(TT)) {
+        setLibcallName(RTLIB::SINCOS_STRET_F32, "__sincosf_stret");
+        setLibcallName(RTLIB::SINCOS_STRET_F64, "__sincos_stret");
+        if (TT.isWatchABI()) {
+          setLibcallCallingConv(RTLIB::SINCOS_STRET_F32,
+                                CallingConv::ARM_AAPCS_VFP);
+          setLibcallCallingConv(RTLIB::SINCOS_STRET_F64,
+                                CallingConv::ARM_AAPCS_VFP);
+        }
+      }
+
+      switch (TT.getOS()) {
+      case Triple::MacOSX:
+        if (TT.isMacOSXVersionLT(10, 9)) {
+          setLibcallName(RTLIB::EXP10_F32, nullptr);
+          setLibcallName(RTLIB::EXP10_F64, nullptr);
+        } else {
+          setLibcallName(RTLIB::EXP10_F32, "__exp10f");
+          setLibcallName(RTLIB::EXP10_F64, "__exp10");
+        }
+        break;
+      case Triple::IOS:
+      case Triple::TvOS:
+      case Triple::WatchOS:
+      case Triple::XROS:
+        if (!TT.isWatchOS() && (TT.isOSVersionLT(7, 0) ||
+                                (TT.isOSVersionLT(9, 0) && TT.isX86()))) {
+          setLibcallName(RTLIB::EXP10_F32, nullptr);
+          setLibcallName(RTLIB::EXP10_F64, nullptr);
+        } else {
+          setLibcallName(RTLIB::EXP10_F32, "__exp10f");
+          setLibcallName(RTLIB::EXP10_F64, "__exp10");
+        }
+
+        break;
+      default:
+        break;
+      }
+    } else {
+      setLibcallName(RTLIB::FPEXT_F16_F32, "__gnu_h2f_ieee");
+      setLibcallName(RTLIB::FPROUND_F32_F16, "__gnu_f2h_ieee");
+    }
+
+    if (TT.isGNUEnvironment() || TT.isOSFuchsia() ||
+        (TT.isAndroid() && !TT.isAndroidVersionLT(9))) {
+      setLibcallName(RTLIB::SINCOS_F32, "sincosf");
+      setLibcallName(RTLIB::SINCOS_F64, "sincos");
+      setLibcallName(RTLIB::SINCOS_F80, "sincosl");
+      setLibcallName(RTLIB::SINCOS_F128, "sincosl");
+      setLibcallName(RTLIB::SINCOS_PPCF128, "sincosl");
+    }
+
+    if (TT.isPS()) {
+      setLibcallName(RTLIB::SINCOS_F32, "sincosf");
+      setLibcallName(RTLIB::SINCOS_F64, "sincos");
+    }
+
+    if (TT.isOSOpenBSD()) {
+      setLibcallName(RTLIB::STACKPROTECTOR_CHECK_FAIL, nullptr);
+    }
+
+    if (TT.isOSWindows() && !TT.isOSCygMing()) {
+      setLibcallName(RTLIB::LDEXP_F32, nullptr);
+      setLibcallName(RTLIB::LDEXP_F80, nullptr);
+      setLibcallName(RTLIB::LDEXP_F128, nullptr);
+      setLibcallName(RTLIB::LDEXP_PPCF128, nullptr);
+
+      setLibcallName(RTLIB::FREXP_F32, nullptr);
+      setLibcallName(RTLIB::FREXP_F80, nullptr);
+      setLibcallName(RTLIB::FREXP_F128, nullptr);
+      setLibcallName(RTLIB::FREXP_PPCF128, nullptr);
+    }
+
+    if (TT.isAArch64()) {
+      if (TT.isOSMSVCRT()) {
+        // MSVCRT doesn't have powi; fall back to pow
+        setLibcallName(RTLIB::POWI_F32, nullptr);
+        setLibcallName(RTLIB::POWI_F64, nullptr);
+      }
+    }
+
+    // Disable most libcalls on AMDGPU.
+    if (TT.isAMDGPU()) {
+      for (int I = 0; I < RTLIB::UNKNOWN_LIBCALL; ++I) {
+        if (I < RTLIB::ATOMIC_LOAD || I > RTLIB::ATOMIC_FETCH_NAND_16)
+          setLibcallName(static_cast<RTLIB::Libcall>(I), nullptr);
+      }
+    }
+
+    if (TT.isARM() || TT.isThumb()) {
+      // These libcalls are not available in 32-bit.
+      setLibcallName(RTLIB::SHL_I128, nullptr);
+      setLibcallName(RTLIB::SRL_I128, nullptr);
+      setLibcallName(RTLIB::SRA_I128, nullptr);
+      setLibcallName(RTLIB::MUL_I128, nullptr);
+      setLibcallName(RTLIB::MULO_I64, nullptr);
+      setLibcallName(RTLIB::MULO_I128, nullptr);
+
+      if (TT.isOSMSVCRT()) {
+        // MSVCRT doesn't have powi; fall back to pow
+        setLibcallName(RTLIB::POWI_F32, nullptr);
+        setLibcallName(RTLIB::POWI_F64, nullptr);
+      }
+    }
+
+    if (TT.getArch() == Triple::ArchType::avr) {
+      // Division rtlib functions (not supported), use divmod functions instead
+      setLibcallName(RTLIB::SDIV_I8, nullptr);
+      setLibcallName(RTLIB::SDIV_I16, nullptr);
+      setLibcallName(RTLIB::SDIV_I32, nullptr);
+      setLibcallName(RTLIB::UDIV_I8, nullptr);
+      setLibcallName(RTLIB::UDIV_I16, nullptr);
+      setLibcallName(RTLIB::UDIV_I32, nullptr);
+
+      // Modulus rtlib functions (not supported), use divmod functions instead
+      setLibcallName(RTLIB::SREM_I8, nullptr);
+      setLibcallName(RTLIB::SREM_I16, nullptr);
+      setLibcallName(RTLIB::SREM_I32, nullptr);
+      setLibcallName(RTLIB::UREM_I8, nullptr);
+      setLibcallName(RTLIB::UREM_I16, nullptr);
+      setLibcallName(RTLIB::UREM_I32, nullptr);
+    }
+
+    if (TT.getArch() == Triple::ArchType::hexagon) {
+      // These cause problems when the shift amount is non-constant.
+      setLibcallName(RTLIB::SHL_I128, nullptr);
+      setLibcallName(RTLIB::SRL_I128, nullptr);
+      setLibcallName(RTLIB::SRA_I128, nullptr);
+    }
+
+    if (TT.isLoongArch()) {
+      if (!TT.isLoongArch64()) {
+        // Set libcalls.
+        setLibcallName(RTLIB::MUL_I128, nullptr);
+        // The MULO libcall is not part of libgcc, only compiler-rt.
+        setLibcallName(RTLIB::MULO_I64, nullptr);
+      }
+      // The MULO libcall is not part of libgcc, only compiler-rt.
+      setLibcallName(RTLIB::MULO_I128, nullptr);
+    }
+
+    if (TT.isMIPS32()) {
+      // These libcalls are not available in 32-bit.
+      setLibcallName(RTLIB::SHL_I128, nullptr);
+      setLibcallName(RTLIB::SRL_I128, nullptr);
+      setLibcallName(RTLIB::SRA_I128, nullptr);
+      setLibcallName(RTLIB::MUL_I128, nullptr);
+      setLibcallName(RTLIB::MULO_I64, nullptr);
+      setLibcallName(RTLIB::MULO_I128, nullptr);
+    }
+
+    if (TT.isPPC()) {
+      if (!TT.isPPC64()) {
+        // These libcalls are not available in 32-bit.
+        setLibcallName(RTLIB::SHL_I128, nullptr);
+        setLibcallName(RTLIB::SRL_I128, nullptr);
+        setLibcallName(RTLIB::SRA_I128, nullptr);
+        setLibcallName(RTLIB::MUL_I128, nullptr);
+        setLibcallName(RTLIB::MULO_I64, nullptr);
+      }
+      setLibcallName(RTLIB::MULO_I128, nullptr);
+    }
+
+    if (TT.isRISCV32()) {
+      // These libcalls are not available in 32-bit.
+      setLibcallName(RTLIB::SHL_I128, nullptr);
+      setLibcallName(RTLIB::SRL_I128, nullptr);
+      setLibcallName(RTLIB::SRA_I128, nullptr);
+      setLibcallName(RTLIB::MUL_I128, nullptr);
+      setLibcallName(RTLIB::MULO_I64, nullptr);
+    }
+
+    if (TT.isSPARC()) {
+      if (!TT.isSPARC64()) {
+        // These libcalls are not available in 32-bit.
+        setLibcallName(RTLIB::MULO_I64, nullptr);
+        setLibcallName(RTLIB::MUL_I128, nullptr);
+        setLibcallName(RTLIB::SHL_I128, nullptr);
+        setLibcallName(RTLIB::SRL_I128, nullptr);
+        setLibcallName(RTLIB::SRA_I128, nullptr);
+      }
+      setLibcallName(RTLIB::MULO_I128, nullptr);
+    }
+
+    if (TT.isSystemZ()) {
+      setLibcallName(RTLIB::SRL_I128, nullptr);
+      setLibcallName(RTLIB::SHL_I128, nullptr);
+      setLibcallName(RTLIB::SRA_I128, nullptr);
+    }
+
+    if (TT.isX86()) {
+      if (TT.getArch() == Triple::ArchType::x86) {
+        // These libcalls are not available in 32-bit.
+        setLibcallName(RTLIB::SHL_I128, nullptr);
+        setLibcallName(RTLIB::SRL_I128, nullptr);
+        setLibcallName(RTLIB::SRA_I128, nullptr);
+        setLibcallName(RTLIB::MUL_I128, nullptr);
+        // The MULO libcall is not part of libgcc, only compiler-rt.
+        setLibcallName(RTLIB::MULO_I64, nullptr);
+      }
+
+      // The MULO libcall is not part of libgcc, only compiler-rt.
+      setLibcallName(RTLIB::MULO_I128, nullptr);
+
+      if (TT.isOSMSVCRT()) {
+        // MSVCRT doesn't have powi; fall back to pow
+        setLibcallName(RTLIB::POWI_F32, nullptr);
+        setLibcallName(RTLIB::POWI_F64, nullptr);
+      }
+    }
+  }
+};
+
 /// This base class for TargetLowering contains the SelectionDAG-independent
 /// parts that can be used from the rest of CodeGen.
 class TargetLoweringBase {
@@ -3410,44 +3840,40 @@ class TargetLoweringBase {
     return nullptr;
   }
 
-  //===--...
[truncated]

github-actions · 2024-07-11T18:06:05Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/lib/LTO/LTO.cpp

arsenm · 2024-07-11T18:08:33Z

llvm/lib/LTO/LTO.cpp

+  SmallVector<const char *> LibcallSymbols;
+  copy_if(Libcalls.getLibcallNames(), std::back_inserter(LibcallSymbols),
+          [](const char *Name) { return Name; });
+  return LibcallSymbols;


Can we avoid making a copy of this giant vector every time it's used

I thought about that, but in-use this is only ever called once and never re-used. The alternative would need some dense map of triples to vectors and I don't think it would add anything.

sbc100

Can you explain what you mean by "but cannot maintain the overhead
of several unused function calls"?

My understanding is that these symbols are still only included in the final binary if they are actually required/referenced. The handleLibcall mechanism just forces them to be included/available during LTO compilation, it doesn't actually force them to be included in the output binary, does it?

llvm/include/llvm/CodeGen/TargetLowering.h

jhuber6 · 2024-07-11T18:26:43Z

Can you explain what you mean by "but cannot maintain the overhead of several unused function calls"?

Poorly worded, we don't want random functions that aren't used to sit around in the binary. If I do -lm it adds about 1.2 seconds of link time since it forces every single function to go through codegen. It also bloats the binary.

My understanding is that these symbols are still only included in the final binary if they are actually required/referenced. The handleLibcall mechanism just forces them to be included/available during LTO compilation, it doesn't actually force them to be included in the output binary, does it?

No, they are extracted unconditionally if they are found and not allowed to be internalized. That's what this patch is trying to solve.

sbc100 · 2024-07-11T19:24:56Z

No, they are extracted unconditionally if they are found and not allowed to be internalized. That's what this patch is trying to solve.

Sorry I'm still not quite sure that this means. Are you saying that they then take up space in the final output binary? Because that is not what we see in emscripten. There functions go through codegen but are then discarded by the linker since they are not referenced. I believe the are discarded during --gc-sections (which is on by default for wasm which is perhaps why we don't see the effect of these symbols in emscripten?)

jhuber6 · 2024-07-11T19:30:10Z

No, they are extracted unconditionally if they are found and not allowed to be internalized. That's what this patch is trying to solve.

Sorry I'm still not quite sure that this means. Are you saying that they then take up space in the final output binary? Because that is not what we see in emscripten. There functions go through codegen but are then discarded by the linker since they are not referenced. I believe the are discarded during --gc-sections (which is on by default for wasm which is perhaps why we don't see the effect of these symbols in emscripten?)

The logic I modify in this patch controls the symbols. In IRSymTab.cpp we have logic that states

if (PreservedSymbols.contains(Sym))
  Sym.used();

Then in LTO.cpp we have logic that prevents used symbols from being internalized.

The logic in LLD pretty much states.

if (sym.isLibcall)
  sym->extract();

So it will always be extracted from a static archive even if it's unused because it thinks the backend needs it.

Likely a combination of --ffunction-sections and -Wl,--gc-sections will remove the dead functions, but that doesn't stop lld from extracting it and forcing it through codegen, which adds over a second of link time as it stands if you include math definitions.

sbc100 · 2024-07-11T19:58:18Z

Likely a combination of --ffunction-sections and -Wl,--gc-sections will remove the dead functions, but that doesn't stop lld from extracting it and forcing it through codegen, which adds over a second of link time as it stands if you include math definitions.

Ok that would explain it. -ffunction-sections and -Wl,--gc-sections are both enabled by default for the Wasm target. But if there are link time savings due to less codegen that would still benefit us. Thanks.

jhuber6 · 2024-07-11T20:04:22Z

Likely a combination of --ffunction-sections and -Wl,--gc-sections will remove the dead functions, but that doesn't stop lld from extracting it and forcing it through codegen, which adds over a second of link time as it stands if you include math definitions.

Ok that would explain it. -ffunction-sections and -Wl,--gc-sections are both enabled by default for the Wasm target. But if there are link time savings due to less codegen that would still benefit us. Thanks.

Yeah, it'll make a difference if you're trying to do LTO with a static library that contains libcalls, which is a little niche but important for my use-case. I also wonder if we should do this by default for AMDGPU / NVPTX as well, @arsenm WDYT?

jhuber6 · 2024-07-11T20:06:37Z

lld/COFF/Driver.cpp

-      if (!ctx.bitcodeFileInstances.empty())
-        for (auto *s : lto::LTO::getRuntimeLibcallSymbols())
+      if (!ctx.bitcodeFileInstances.empty()) {
+        llvm::Triple TT(


@MaskRay It's safe to assume that obj is always non-null at this point and that all IR files have the same triple, right?

efriedma-quic · 2024-07-11T20:13:22Z

So it will always be extracted from a static archive even if it's unused because it thinks the backend needs it.

If the function in the archive is compiled code, this should be cheap, so it doesn't really matter. If it's bitcode, LTO isn't really architected to handle it. I don't think we'd want to modify the way symbol lookup works. Maybe we can pull the bitcode files into the link, but delay actually compiling them until we know whether they're actually necessary. Or we can examine the bitcode inputs in more detail to determine which functions they can actually call.

Some examples of heuristics we could use:

query target features for whether some specific operation needs to be emulated
scan the module for declarations of math intrinsics

This patch doesn't address any of that, though; it just ensures that the linker agrees with the LLVM backend about which functions should actually be treated as "runtime functions" for a given triple (i.e. functions which can be called by the LLVM backend even if they aren't referenced in the IR), as opposed to using the same set for every triple.

Summary: The target information needs to configure that the platform has a maximum integer size of 64 in order for it to enable i128 support. The motivation behind this patch is that the i128 libcalls seem to be the only ones used by the NVPTX backend and it would be ideal to disable those completely. That would allow LTO to optimize libcalls properly after llvm#98512.

vitalybuka · 2024-07-16T19:35:16Z

Is anyone looking at https://lab.llvm.org/buildbot/#/builders/66/builds/1669 ?

jhuber6 · 2024-07-16T19:41:17Z

Is anyone looking at https://lab.llvm.org/buildbot/#/builders/66/builds/1669 ?

Unsure if this is related... I guess you could try bisecting it.

vitalybuka · 2024-07-16T19:42:57Z

This happens compiling RuntimeLibcalls.cpp introduced in this patch.

jhuber6 · 2024-07-16T19:45:04Z

This happens compiling RuntimeLibcalls.cpp introduced in this patch.

You're right, Maybe there's some configuration that causes ISDOpcodes to not be available?

vitalybuka · 2024-07-16T19:45:49Z

Note: this is not incremental bot, every time it removes build dir, so this is not incremental CMake issue.

I believe

DEPENDS
  vt_gen
  intrinsics_gen

I'll try and patch

Fixes 'llvm/CodeGen/GenVT.inc' file not found. Follow up to #98512

fhahn · 2024-07-16T20:11:55Z

@vitalybuka at least for me locally bb604ae fixed the issue, thanks!

…#98512) Summary: The LTO pass and LLD linker have logic in them that forces extraction and prevent internalization of needed runtime calls. However, these currently take all RTLibcalls into account, even if the target does not support them. The target opts-out of a libcall if it sets its name to nullptr. This patch pulls this logic out into a class in the header so that LTO / lld can use it to determine if a symbol actually needs to be kept. This is important for targets like AMDGPU that want to be able to use `lld` to perform the final link step, but does not want the overhead of uncalled functions. (This adds like a second to the link time trivially) Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D59822415

jhuber6 · 2024-07-16T20:13:58Z

Note: this is not incremental bot, every time it removes build dir, so this is not incremental CMake issue.

I believe
DEPENDS
  vt_gen
  intrinsics_gen
I'll try and patch

Good catch, appreciate the fix!

chapuni

Likely layering violation.

chapuni · 2024-07-17T12:16:45Z

llvm/include/llvm/IR/RuntimeLibcalls.h

+#define LLVM_IR_RUNTIME_LIBCALLS_H
+
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/CodeGen/ISDOpcodes.h"


llvm/IR should not depend on llvm/CodeGen

This should be unused here

Oh, I see it's used for the cond code actions. Those should probably be in terms of the IR compare types

jhuber6 · 2024-07-17T12:18:48Z

Likely layering violation.

I thought about that, but because the parts we needed were header only I figured it was legal. It's needed for the default calling conventions, which we could move somewhere else if needed.

chapuni · 2024-07-19T06:53:59Z

@jhuber6 I've filed #99610.

but because the parts we needed were header only I figured it was legal.

It shall be problematic for modularizing header units.

…ed (#98512)" This reverts commit c05126b. (llvmorg-19-init-17714-gc05126bdfc3b) See #99610

…ped (#98512)" This reverts commit 740161a. I moved the `ISD` dependencies into the CodeGen portion of the handling, it's a little awkward but it's the easiest solution I can think of for now.

Fixes 'llvm/CodeGen/GenVT.inc' file not found. Follow up to llvm#98512

…ed (llvm#98512)" This reverts commit c05126b. (llvmorg-19-init-17714-gc05126bdfc3b) See llvm#99610

…ped (llvm#98512)" This reverts commit 740161a. I moved the `ISD` dependencies into the CodeGen portion of the handling, it's a little awkward but it's the easiest solution I can think of for now.

Summary: The LTO pass and LLD linker have logic in them that forces extraction and prevent internalization of needed runtime calls. However, these currently take all RTLibcalls into account, even if the target does not support them. The target opts-out of a libcall if it sets its name to nullptr. This patch pulls this logic out into a class in the header so that LTO / lld can use it to determine if a symbol actually needs to be kept. This is important for targets like AMDGPU that want to be able to use `lld` to perform the final link step, but does not want the overhead of uncalled functions. (This adds like a second to the link time trivially) Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251770

Summary: Fixes 'llvm/CodeGen/GenVT.inc' file not found. Follow up to #98512 Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251678

…ed (#98512)" This reverts commit c05126b. (llvmorg-19-init-17714-gc05126bdfc3b) See #99610

…ped (#98512)" This reverts commit 740161a. I moved the `ISD` dependencies into the CodeGen portion of the handling, it's a little awkward but it's the easiest solution I can think of for now.

Fix a bug that `lto_runtime_lib_symbols_list` is returning the address of a local variable that will be freed when getting out of scope. This is a regression from #98512 that rewrites the runtime libcall function lists into a SmallVector. rdar://135559037

Fix a bug that `lto_runtime_lib_symbols_list` is returning the address of a local variable that will be freed when getting out of scope. This is a regression from llvm#98512 that rewrites the runtime libcall function lists into a SmallVector. rdar://135559037 (cherry picked from commit 66e9078)

Fix a bug that `lto_runtime_lib_symbols_list` is returning the address of a local variable that will be freed when getting out of scope. This is a regression from llvm#98512 that rewrites the runtime libcall function lists into a SmallVector. rdar://135559037

jhuber6 requested review from arsenm, Artem-B, efriedma-quic, jayfoad, jyknight, MaskRay and topperc July 11, 2024 18:02

llvmbot added lld lld:ELF lld:COFF lld:wasm platform:windows LTO Link time optimization (regular/full LTO or ThinLTO) llvm:binary-utilities labels Jul 11, 2024

jhuber6 force-pushed the FactorOutRTLib branch from 2c74173 to 8f67c6a Compare July 11, 2024 18:07

arsenm reviewed Jul 11, 2024

View reviewed changes

sbc100 reviewed Jul 11, 2024

View reviewed changes

llvm/include/llvm/CodeGen/TargetLowering.h Outdated Show resolved Hide resolved

jhuber6 force-pushed the FactorOutRTLib branch from 8f67c6a to 9e0c494 Compare July 11, 2024 18:34

jhuber6 force-pushed the FactorOutRTLib branch from 9e0c494 to bd48223 Compare July 11, 2024 19:58

llvmbot added the llvm:ir label Jul 11, 2024

jhuber6 force-pushed the FactorOutRTLib branch from bd48223 to e74cf8b Compare July 11, 2024 20:02

jhuber6 commented Jul 11, 2024

View reviewed changes

vitalybuka added a commit that referenced this pull request Jul 16, 2024

[LLVM][LTO] Add missing dependency

bb604ae

Fixes 'llvm/CodeGen/GenVT.inc' file not found. Follow up to #98512

chapuni reviewed Jul 17, 2024

View reviewed changes

chapuni mentioned this pull request Jul 19, 2024

llvm/IR/RuntimeLibcalls.h has introduced layering violation #99610

Closed

chapuni added a commit that referenced this pull request Jul 20, 2024

Revert "[LLVM][LTO] Factor out RTLib calls and allow them to be dropp…

740161a

…ed (#98512)" This reverts commit c05126b. (llvmorg-19-init-17714-gc05126bdfc3b) See #99610

sgundapa pushed a commit to sgundapa/upstream_effort that referenced this pull request Jul 23, 2024

[LLVM][LTO] Add missing dependency

2a807f1

Fixes 'llvm/CodeGen/GenVT.inc' file not found. Follow up to llvm#98512

sgundapa pushed a commit to sgundapa/upstream_effort that referenced this pull request Jul 23, 2024

Revert "[LLVM][LTO] Factor out RTLib calls and allow them to be dropp…

980d30c

…ed (llvm#98512)" This reverts commit c05126b. (llvmorg-19-init-17714-gc05126bdfc3b) See llvm#99610

yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024

Revert "[LLVM][LTO] Factor out RTLib calls and allow them to be dropp…

ae26072

…ed (#98512)" This reverts commit c05126b. (llvmorg-19-init-17714-gc05126bdfc3b) See #99610

cachemeifyoucan mentioned this pull request Sep 9, 2024

[LTO] Fix a use-after-free in legacy LTO C APIs #107896

Merged

cachemeifyoucan mentioned this pull request Sep 9, 2024

[LTO] Fix a use-after-free in legacy LTO C APIs (#107896) swiftlang/llvm-project#9235

Merged

bader mentioned this pull request Sep 20, 2024

[RFC] thinLTO for SYCL intel/llvm#15083

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLVM][LTO] Factor out RTLib calls and allow them to be dropped #98512

[LLVM][LTO] Factor out RTLib calls and allow them to be dropped #98512

jhuber6 commented Jul 11, 2024 •

edited

Loading

llvmbot commented Jul 11, 2024 •

edited

Loading

github-actions bot commented Jul 11, 2024 •

edited

Loading

arsenm Jul 11, 2024

jhuber6 Jul 11, 2024

sbc100 left a comment

jhuber6 commented Jul 11, 2024

sbc100 commented Jul 11, 2024

jhuber6 commented Jul 11, 2024

sbc100 commented Jul 11, 2024

jhuber6 commented Jul 11, 2024

jhuber6 Jul 11, 2024

efriedma-quic commented Jul 11, 2024

vitalybuka commented Jul 16, 2024

jhuber6 commented Jul 16, 2024

vitalybuka commented Jul 16, 2024

jhuber6 commented Jul 16, 2024

vitalybuka commented Jul 16, 2024

fhahn commented Jul 16, 2024

jhuber6 commented Jul 16, 2024

chapuni left a comment

chapuni Jul 17, 2024

arsenm Jul 17, 2024

arsenm Jul 17, 2024

jhuber6 commented Jul 17, 2024

chapuni commented Jul 19, 2024

[LLVM][LTO] Factor out RTLib calls and allow them to be dropped #98512

[LLVM][LTO] Factor out RTLib calls and allow them to be dropped #98512

Conversation

jhuber6 commented Jul 11, 2024 • edited Loading

llvmbot commented Jul 11, 2024 • edited Loading

github-actions bot commented Jul 11, 2024 • edited Loading

arsenm Jul 11, 2024

Choose a reason for hiding this comment

jhuber6 Jul 11, 2024

Choose a reason for hiding this comment

sbc100 left a comment

Choose a reason for hiding this comment

jhuber6 commented Jul 11, 2024

sbc100 commented Jul 11, 2024

jhuber6 commented Jul 11, 2024

sbc100 commented Jul 11, 2024

jhuber6 commented Jul 11, 2024

jhuber6 Jul 11, 2024

Choose a reason for hiding this comment

efriedma-quic commented Jul 11, 2024

vitalybuka commented Jul 16, 2024

jhuber6 commented Jul 16, 2024

vitalybuka commented Jul 16, 2024

jhuber6 commented Jul 16, 2024

vitalybuka commented Jul 16, 2024

fhahn commented Jul 16, 2024

jhuber6 commented Jul 16, 2024

chapuni left a comment

Choose a reason for hiding this comment

chapuni Jul 17, 2024

Choose a reason for hiding this comment

arsenm Jul 17, 2024

Choose a reason for hiding this comment

arsenm Jul 17, 2024

Choose a reason for hiding this comment

jhuber6 commented Jul 17, 2024

chapuni commented Jul 19, 2024

jhuber6 commented Jul 11, 2024 •

edited

Loading

llvmbot commented Jul 11, 2024 •

edited

Loading

github-actions bot commented Jul 11, 2024 •

edited

Loading