Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLVM][LTO] Factor out RTLib calls and allow them to be dropped #98512

Merged
merged 2 commits into from
Jul 16, 2024

Conversation

jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented Jul 11, 2024

Summary:
The LTO pass and LLD linker have logic in them that forces extraction
and prevent internalization of needed runtime calls. However, these
currently take all RTLibcalls into account, even if the target does not
support them. The target opts-out of a libcall if it sets its name to
nullptr. This patch pulls this logic out into a class in the header so
that LTO / lld can use it to determine if a symbol actually needs to be
kept.

This is important for targets like AMDGPU that want to be able to use
lld to perform the final link step, but does not want the overhead of
uncalled functions. (This adds like a second to the link time trivially)

@llvmbot
Copy link
Collaborator

llvmbot commented Jul 11, 2024

@llvm/pr-subscribers-backend-arm
@llvm/pr-subscribers-backend-webassembly
@llvm/pr-subscribers-llvm-selectiondag
@llvm/pr-subscribers-debuginfo
@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-lto
@llvm/pr-subscribers-lld
@llvm/pr-subscribers-lld-coff
@llvm/pr-subscribers-lld-elf
@llvm/pr-subscribers-platform-windows

@llvm/pr-subscribers-lld-wasm

Author: Joseph Huber (jhuber6)

Changes

Summary:
The LTO pass and LLD linker have logic in them that forces extraction
and prevent internalization of needed runtime calls. However, these
currently take all RTLibcalls into account, even if the target does not
support them. The target opts-out of a libcall if it sets its name to
nullptr. This patch pulls this logic out into a class in the header so
that LTO / lld can use it to determine if a symbol actually needs to be
kept.

This is important for targets like AMDGPU that want to be able to use
lld to perform the final link step, but cannot maintain the overhead
of several unused function calls.


Patch is 42.54 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/98512.diff

9 Files Affected:

  • (modified) lld/COFF/Driver.cpp (+5-2)
  • (modified) lld/ELF/Driver.cpp (+4-2)
  • (modified) lld/wasm/Driver.cpp (+4-2)
  • (modified) llvm/include/llvm/CodeGen/TargetLowering.h (+440-24)
  • (modified) llvm/include/llvm/LTO/LTO.h (+1-1)
  • (modified) llvm/lib/CodeGen/TargetLoweringBase.cpp (+2-371)
  • (modified) llvm/lib/LTO/LTO.cpp (+8-7)
  • (modified) llvm/lib/Object/IRSymtab.cpp (+13-7)
  • (modified) llvm/tools/lto/lto.cpp (+1-1)
diff --git a/lld/COFF/Driver.cpp b/lld/COFF/Driver.cpp
index cef6271e4c8f8..9e28b1c50be50 100644
--- a/lld/COFF/Driver.cpp
+++ b/lld/COFF/Driver.cpp
@@ -2428,9 +2428,12 @@ void LinkerDriver::linkerMain(ArrayRef<const char *> argsArr) {
       // file's symbol table. If any of those library functions are defined in a
       // bitcode file in an archive member, we need to arrange to use LTO to
       // compile those archive members by adding them to the link beforehand.
-      if (!ctx.bitcodeFileInstances.empty())
-        for (auto *s : lto::LTO::getRuntimeLibcallSymbols())
+      if (!ctx.bitcodeFileInstances.empty()) {
+        llvm::Triple TT(
+            ctx.bitcodeFileInstances.front()->obj->getTargetTriple());
+        for (auto *s : lto::LTO::getRuntimeLibcallSymbols(TT))
           ctx.symtab.addLibcall(s);
+      }
 
       // Windows specific -- if __load_config_used can be resolved, resolve it.
       if (ctx.symtab.findUnderscore("_load_config_used"))
diff --git a/lld/ELF/Driver.cpp b/lld/ELF/Driver.cpp
index abfa313bfef0e..5c7ff8dcda945 100644
--- a/lld/ELF/Driver.cpp
+++ b/lld/ELF/Driver.cpp
@@ -2883,9 +2883,11 @@ template <class ELFT> void LinkerDriver::link(opt::InputArgList &args) {
   // to, i.e. if the symbol's definition is in bitcode. Any other required
   // libcall symbols will be added to the link after LTO when we add the LTO
   // object file to the link.
-  if (!ctx.bitcodeFiles.empty())
-    for (auto *s : lto::LTO::getRuntimeLibcallSymbols())
+  if (!ctx.bitcodeFiles.empty()) {
+    llvm::Triple TT(ctx.bitcodeFiles.front()->obj->getTargetTriple());
+    for (auto *s : lto::LTO::getRuntimeLibcallSymbols(TT))
       handleLibcall(s);
+  }
 
   // Archive members defining __wrap symbols may be extracted.
   std::vector<WrappedSymbol> wrapped = addWrappedSymbols(args);
diff --git a/lld/wasm/Driver.cpp b/lld/wasm/Driver.cpp
index d099689911fc6..1d545420e182c 100644
--- a/lld/wasm/Driver.cpp
+++ b/lld/wasm/Driver.cpp
@@ -1319,9 +1319,11 @@ void LinkerDriver::linkerMain(ArrayRef<const char *> argsArr) {
   // We only need to add libcall symbols to the link before LTO if the symbol's
   // definition is in bitcode. Any other required libcall symbols will be added
   // to the link after LTO when we add the LTO object file to the link.
-  if (!ctx.bitcodeFiles.empty())
-    for (auto *s : lto::LTO::getRuntimeLibcallSymbols())
+  if (!ctx.bitcodeFiles.empty()) {
+    llvm::Triple TT(ctx.bitcodeFiles.front()->obj->getTargetTriple());
+    for (auto *s : lto::LTO::getRuntimeLibcallSymbols(TT))
       handleLibcall(s);
+  }
   if (errorCount())
     return;
 
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index 55b60b01e5827..bda55ab1402be 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -189,6 +189,436 @@ struct MemOp {
   }
 };
 
+struct LibcallsInfo {
+  explicit LibcallsInfo(const Triple &TT) {
+    initLibcalls(TT);
+    initCmpLibcallCCs();
+  }
+
+  /// Rename the default libcall routine name for the specified libcall.
+  void setLibcallName(RTLIB::Libcall Call, const char *Name) {
+    LibcallRoutineNames[Call] = Name;
+  }
+
+  void setLibcallName(ArrayRef<RTLIB::Libcall> Calls, const char *Name) {
+    for (auto Call : Calls)
+      setLibcallName(Call, Name);
+  }
+
+  /// Get the libcall routine name for the specified libcall.
+  const char *getLibcallName(RTLIB::Libcall Call) const {
+    return LibcallRoutineNames[Call];
+  }
+
+  /// Override the default CondCode to be used to test the result of the
+  /// comparison libcall against zero.
+  void setCmpLibcallCC(RTLIB::Libcall Call, ISD::CondCode CC) {
+    CmpLibcallCCs[Call] = CC;
+  }
+
+  /// Get the CondCode that's to be used to test the result of the comparison
+  /// libcall against zero.
+  ISD::CondCode getCmpLibcallCC(RTLIB::Libcall Call) const {
+    return CmpLibcallCCs[Call];
+  }
+
+  /// Set the CallingConv that should be used for the specified libcall.
+  void setLibcallCallingConv(RTLIB::Libcall Call, CallingConv::ID CC) {
+    LibcallCallingConvs[Call] = CC;
+  }
+
+  /// Get the CallingConv that should be used for the specified libcall.
+  CallingConv::ID getLibcallCallingConv(RTLIB::Libcall Call) const {
+    return LibcallCallingConvs[Call];
+  }
+
+  iterator_range<const char **> getLibcallNames() {
+    return llvm::make_range(LibcallRoutineNames,
+                            LibcallRoutineNames + RTLIB::UNKNOWN_LIBCALL);
+  }
+
+private:
+  /// Stores the name each libcall.
+  const char *LibcallRoutineNames[RTLIB::UNKNOWN_LIBCALL + 1];
+
+  /// The ISD::CondCode that should be used to test the result of each of the
+  /// comparison libcall against zero.
+  ISD::CondCode CmpLibcallCCs[RTLIB::UNKNOWN_LIBCALL];
+
+  /// Stores the CallingConv that should be used for each libcall.
+  CallingConv::ID LibcallCallingConvs[RTLIB::UNKNOWN_LIBCALL];
+
+  static bool darwinHasSinCos(const Triple &TT) {
+    assert(TT.isOSDarwin() && "should be called with darwin triple");
+    // Don't bother with 32 bit x86.
+    if (TT.getArch() == Triple::x86)
+      return false;
+    // Macos < 10.9 has no sincos_stret.
+    if (TT.isMacOSX())
+      return !TT.isMacOSXVersionLT(10, 9) && TT.isArch64Bit();
+    // iOS < 7.0 has no sincos_stret.
+    if (TT.isiOS())
+      return !TT.isOSVersionLT(7, 0);
+    // Any other darwin such as WatchOS/TvOS is new enough.
+    return true;
+  }
+
+  /// Sets default libcall calling conventions.
+  void initCmpLibcallCCs() {
+    std::fill(CmpLibcallCCs, CmpLibcallCCs + RTLIB::UNKNOWN_LIBCALL,
+              ISD::SETCC_INVALID);
+    CmpLibcallCCs[RTLIB::OEQ_F32] = ISD::SETEQ;
+    CmpLibcallCCs[RTLIB::OEQ_F64] = ISD::SETEQ;
+    CmpLibcallCCs[RTLIB::OEQ_F128] = ISD::SETEQ;
+    CmpLibcallCCs[RTLIB::OEQ_PPCF128] = ISD::SETEQ;
+    CmpLibcallCCs[RTLIB::UNE_F32] = ISD::SETNE;
+    CmpLibcallCCs[RTLIB::UNE_F64] = ISD::SETNE;
+    CmpLibcallCCs[RTLIB::UNE_F128] = ISD::SETNE;
+    CmpLibcallCCs[RTLIB::UNE_PPCF128] = ISD::SETNE;
+    CmpLibcallCCs[RTLIB::OGE_F32] = ISD::SETGE;
+    CmpLibcallCCs[RTLIB::OGE_F64] = ISD::SETGE;
+    CmpLibcallCCs[RTLIB::OGE_F128] = ISD::SETGE;
+    CmpLibcallCCs[RTLIB::OGE_PPCF128] = ISD::SETGE;
+    CmpLibcallCCs[RTLIB::OLT_F32] = ISD::SETLT;
+    CmpLibcallCCs[RTLIB::OLT_F64] = ISD::SETLT;
+    CmpLibcallCCs[RTLIB::OLT_F128] = ISD::SETLT;
+    CmpLibcallCCs[RTLIB::OLT_PPCF128] = ISD::SETLT;
+    CmpLibcallCCs[RTLIB::OLE_F32] = ISD::SETLE;
+    CmpLibcallCCs[RTLIB::OLE_F64] = ISD::SETLE;
+    CmpLibcallCCs[RTLIB::OLE_F128] = ISD::SETLE;
+    CmpLibcallCCs[RTLIB::OLE_PPCF128] = ISD::SETLE;
+    CmpLibcallCCs[RTLIB::OGT_F32] = ISD::SETGT;
+    CmpLibcallCCs[RTLIB::OGT_F64] = ISD::SETGT;
+    CmpLibcallCCs[RTLIB::OGT_F128] = ISD::SETGT;
+    CmpLibcallCCs[RTLIB::OGT_PPCF128] = ISD::SETGT;
+    CmpLibcallCCs[RTLIB::UO_F32] = ISD::SETNE;
+    CmpLibcallCCs[RTLIB::UO_F64] = ISD::SETNE;
+    CmpLibcallCCs[RTLIB::UO_F128] = ISD::SETNE;
+    CmpLibcallCCs[RTLIB::UO_PPCF128] = ISD::SETNE;
+  }
+
+  /// Set default libcall names.
+  void initLibcalls(const Triple &TT) {
+    std::fill(std::begin(LibcallRoutineNames), std::end(LibcallRoutineNames),
+              nullptr);
+
+#define HANDLE_LIBCALL(code, name) setLibcallName(RTLIB::code, name);
+#include "llvm/IR/RuntimeLibcalls.def"
+#undef HANDLE_LIBCALL
+
+    // Initialize calling conventions to their default.
+    for (int LC = 0; LC < RTLIB::UNKNOWN_LIBCALL; ++LC)
+      setLibcallCallingConv((RTLIB::Libcall)LC, CallingConv::C);
+
+    // Use the f128 variants of math functions on x86_64
+    if (TT.getArch() == Triple::ArchType::x86_64 && TT.isGNUEnvironment()) {
+      setLibcallName(RTLIB::REM_F128, "fmodf128");
+      setLibcallName(RTLIB::FMA_F128, "fmaf128");
+      setLibcallName(RTLIB::SQRT_F128, "sqrtf128");
+      setLibcallName(RTLIB::CBRT_F128, "cbrtf128");
+      setLibcallName(RTLIB::LOG_F128, "logf128");
+      setLibcallName(RTLIB::LOG_FINITE_F128, "__logf128_finite");
+      setLibcallName(RTLIB::LOG2_F128, "log2f128");
+      setLibcallName(RTLIB::LOG2_FINITE_F128, "__log2f128_finite");
+      setLibcallName(RTLIB::LOG10_F128, "log10f128");
+      setLibcallName(RTLIB::LOG10_FINITE_F128, "__log10f128_finite");
+      setLibcallName(RTLIB::EXP_F128, "expf128");
+      setLibcallName(RTLIB::EXP_FINITE_F128, "__expf128_finite");
+      setLibcallName(RTLIB::EXP2_F128, "exp2f128");
+      setLibcallName(RTLIB::EXP2_FINITE_F128, "__exp2f128_finite");
+      setLibcallName(RTLIB::EXP10_F128, "exp10f128");
+      setLibcallName(RTLIB::SIN_F128, "sinf128");
+      setLibcallName(RTLIB::COS_F128, "cosf128");
+      setLibcallName(RTLIB::TAN_F128, "tanf128");
+      setLibcallName(RTLIB::SINCOS_F128, "sincosf128");
+      setLibcallName(RTLIB::POW_F128, "powf128");
+      setLibcallName(RTLIB::POW_FINITE_F128, "__powf128_finite");
+      setLibcallName(RTLIB::CEIL_F128, "ceilf128");
+      setLibcallName(RTLIB::TRUNC_F128, "truncf128");
+      setLibcallName(RTLIB::RINT_F128, "rintf128");
+      setLibcallName(RTLIB::NEARBYINT_F128, "nearbyintf128");
+      setLibcallName(RTLIB::ROUND_F128, "roundf128");
+      setLibcallName(RTLIB::ROUNDEVEN_F128, "roundevenf128");
+      setLibcallName(RTLIB::FLOOR_F128, "floorf128");
+      setLibcallName(RTLIB::COPYSIGN_F128, "copysignf128");
+      setLibcallName(RTLIB::FMIN_F128, "fminf128");
+      setLibcallName(RTLIB::FMAX_F128, "fmaxf128");
+      setLibcallName(RTLIB::LROUND_F128, "lroundf128");
+      setLibcallName(RTLIB::LLROUND_F128, "llroundf128");
+      setLibcallName(RTLIB::LRINT_F128, "lrintf128");
+      setLibcallName(RTLIB::LLRINT_F128, "llrintf128");
+      setLibcallName(RTLIB::LDEXP_F128, "ldexpf128");
+      setLibcallName(RTLIB::FREXP_F128, "frexpf128");
+    }
+
+    // For IEEE quad-precision libcall names, PPC uses "kf" instead of "tf".
+    if (TT.isPPC()) {
+      setLibcallName(RTLIB::ADD_F128, "__addkf3");
+      setLibcallName(RTLIB::SUB_F128, "__subkf3");
+      setLibcallName(RTLIB::MUL_F128, "__mulkf3");
+      setLibcallName(RTLIB::DIV_F128, "__divkf3");
+      setLibcallName(RTLIB::POWI_F128, "__powikf2");
+      setLibcallName(RTLIB::FPEXT_F32_F128, "__extendsfkf2");
+      setLibcallName(RTLIB::FPEXT_F64_F128, "__extenddfkf2");
+      setLibcallName(RTLIB::FPROUND_F128_F32, "__trunckfsf2");
+      setLibcallName(RTLIB::FPROUND_F128_F64, "__trunckfdf2");
+      setLibcallName(RTLIB::FPTOSINT_F128_I32, "__fixkfsi");
+      setLibcallName(RTLIB::FPTOSINT_F128_I64, "__fixkfdi");
+      setLibcallName(RTLIB::FPTOSINT_F128_I128, "__fixkfti");
+      setLibcallName(RTLIB::FPTOUINT_F128_I32, "__fixunskfsi");
+      setLibcallName(RTLIB::FPTOUINT_F128_I64, "__fixunskfdi");
+      setLibcallName(RTLIB::FPTOUINT_F128_I128, "__fixunskfti");
+      setLibcallName(RTLIB::SINTTOFP_I32_F128, "__floatsikf");
+      setLibcallName(RTLIB::SINTTOFP_I64_F128, "__floatdikf");
+      setLibcallName(RTLIB::SINTTOFP_I128_F128, "__floattikf");
+      setLibcallName(RTLIB::UINTTOFP_I32_F128, "__floatunsikf");
+      setLibcallName(RTLIB::UINTTOFP_I64_F128, "__floatundikf");
+      setLibcallName(RTLIB::UINTTOFP_I128_F128, "__floatuntikf");
+      setLibcallName(RTLIB::OEQ_F128, "__eqkf2");
+      setLibcallName(RTLIB::UNE_F128, "__nekf2");
+      setLibcallName(RTLIB::OGE_F128, "__gekf2");
+      setLibcallName(RTLIB::OLT_F128, "__ltkf2");
+      setLibcallName(RTLIB::OLE_F128, "__lekf2");
+      setLibcallName(RTLIB::OGT_F128, "__gtkf2");
+      setLibcallName(RTLIB::UO_F128, "__unordkf2");
+    }
+
+    // A few names are different on particular architectures or environments.
+    if (TT.isOSDarwin()) {
+      // For f16/f32 conversions, Darwin uses the standard naming scheme,
+      // instead of the gnueabi-style __gnu_*_ieee.
+      // FIXME: What about other targets?
+      setLibcallName(RTLIB::FPEXT_F16_F32, "__extendhfsf2");
+      setLibcallName(RTLIB::FPROUND_F32_F16, "__truncsfhf2");
+
+      // Some darwins have an optimized __bzero/bzero function.
+      switch (TT.getArch()) {
+      case Triple::x86:
+      case Triple::x86_64:
+        if (TT.isMacOSX() && !TT.isMacOSXVersionLT(10, 6))
+          setLibcallName(RTLIB::BZERO, "__bzero");
+        break;
+      case Triple::aarch64:
+      case Triple::aarch64_32:
+        setLibcallName(RTLIB::BZERO, "bzero");
+        break;
+      default:
+        break;
+      }
+
+      if (darwinHasSinCos(TT)) {
+        setLibcallName(RTLIB::SINCOS_STRET_F32, "__sincosf_stret");
+        setLibcallName(RTLIB::SINCOS_STRET_F64, "__sincos_stret");
+        if (TT.isWatchABI()) {
+          setLibcallCallingConv(RTLIB::SINCOS_STRET_F32,
+                                CallingConv::ARM_AAPCS_VFP);
+          setLibcallCallingConv(RTLIB::SINCOS_STRET_F64,
+                                CallingConv::ARM_AAPCS_VFP);
+        }
+      }
+
+      switch (TT.getOS()) {
+      case Triple::MacOSX:
+        if (TT.isMacOSXVersionLT(10, 9)) {
+          setLibcallName(RTLIB::EXP10_F32, nullptr);
+          setLibcallName(RTLIB::EXP10_F64, nullptr);
+        } else {
+          setLibcallName(RTLIB::EXP10_F32, "__exp10f");
+          setLibcallName(RTLIB::EXP10_F64, "__exp10");
+        }
+        break;
+      case Triple::IOS:
+      case Triple::TvOS:
+      case Triple::WatchOS:
+      case Triple::XROS:
+        if (!TT.isWatchOS() && (TT.isOSVersionLT(7, 0) ||
+                                (TT.isOSVersionLT(9, 0) && TT.isX86()))) {
+          setLibcallName(RTLIB::EXP10_F32, nullptr);
+          setLibcallName(RTLIB::EXP10_F64, nullptr);
+        } else {
+          setLibcallName(RTLIB::EXP10_F32, "__exp10f");
+          setLibcallName(RTLIB::EXP10_F64, "__exp10");
+        }
+
+        break;
+      default:
+        break;
+      }
+    } else {
+      setLibcallName(RTLIB::FPEXT_F16_F32, "__gnu_h2f_ieee");
+      setLibcallName(RTLIB::FPROUND_F32_F16, "__gnu_f2h_ieee");
+    }
+
+    if (TT.isGNUEnvironment() || TT.isOSFuchsia() ||
+        (TT.isAndroid() && !TT.isAndroidVersionLT(9))) {
+      setLibcallName(RTLIB::SINCOS_F32, "sincosf");
+      setLibcallName(RTLIB::SINCOS_F64, "sincos");
+      setLibcallName(RTLIB::SINCOS_F80, "sincosl");
+      setLibcallName(RTLIB::SINCOS_F128, "sincosl");
+      setLibcallName(RTLIB::SINCOS_PPCF128, "sincosl");
+    }
+
+    if (TT.isPS()) {
+      setLibcallName(RTLIB::SINCOS_F32, "sincosf");
+      setLibcallName(RTLIB::SINCOS_F64, "sincos");
+    }
+
+    if (TT.isOSOpenBSD()) {
+      setLibcallName(RTLIB::STACKPROTECTOR_CHECK_FAIL, nullptr);
+    }
+
+    if (TT.isOSWindows() && !TT.isOSCygMing()) {
+      setLibcallName(RTLIB::LDEXP_F32, nullptr);
+      setLibcallName(RTLIB::LDEXP_F80, nullptr);
+      setLibcallName(RTLIB::LDEXP_F128, nullptr);
+      setLibcallName(RTLIB::LDEXP_PPCF128, nullptr);
+
+      setLibcallName(RTLIB::FREXP_F32, nullptr);
+      setLibcallName(RTLIB::FREXP_F80, nullptr);
+      setLibcallName(RTLIB::FREXP_F128, nullptr);
+      setLibcallName(RTLIB::FREXP_PPCF128, nullptr);
+    }
+
+    if (TT.isAArch64()) {
+      if (TT.isOSMSVCRT()) {
+        // MSVCRT doesn't have powi; fall back to pow
+        setLibcallName(RTLIB::POWI_F32, nullptr);
+        setLibcallName(RTLIB::POWI_F64, nullptr);
+      }
+    }
+
+    // Disable most libcalls on AMDGPU.
+    if (TT.isAMDGPU()) {
+      for (int I = 0; I < RTLIB::UNKNOWN_LIBCALL; ++I) {
+        if (I < RTLIB::ATOMIC_LOAD || I > RTLIB::ATOMIC_FETCH_NAND_16)
+          setLibcallName(static_cast<RTLIB::Libcall>(I), nullptr);
+      }
+    }
+
+    if (TT.isARM() || TT.isThumb()) {
+      // These libcalls are not available in 32-bit.
+      setLibcallName(RTLIB::SHL_I128, nullptr);
+      setLibcallName(RTLIB::SRL_I128, nullptr);
+      setLibcallName(RTLIB::SRA_I128, nullptr);
+      setLibcallName(RTLIB::MUL_I128, nullptr);
+      setLibcallName(RTLIB::MULO_I64, nullptr);
+      setLibcallName(RTLIB::MULO_I128, nullptr);
+
+      if (TT.isOSMSVCRT()) {
+        // MSVCRT doesn't have powi; fall back to pow
+        setLibcallName(RTLIB::POWI_F32, nullptr);
+        setLibcallName(RTLIB::POWI_F64, nullptr);
+      }
+    }
+
+    if (TT.getArch() == Triple::ArchType::avr) {
+      // Division rtlib functions (not supported), use divmod functions instead
+      setLibcallName(RTLIB::SDIV_I8, nullptr);
+      setLibcallName(RTLIB::SDIV_I16, nullptr);
+      setLibcallName(RTLIB::SDIV_I32, nullptr);
+      setLibcallName(RTLIB::UDIV_I8, nullptr);
+      setLibcallName(RTLIB::UDIV_I16, nullptr);
+      setLibcallName(RTLIB::UDIV_I32, nullptr);
+
+      // Modulus rtlib functions (not supported), use divmod functions instead
+      setLibcallName(RTLIB::SREM_I8, nullptr);
+      setLibcallName(RTLIB::SREM_I16, nullptr);
+      setLibcallName(RTLIB::SREM_I32, nullptr);
+      setLibcallName(RTLIB::UREM_I8, nullptr);
+      setLibcallName(RTLIB::UREM_I16, nullptr);
+      setLibcallName(RTLIB::UREM_I32, nullptr);
+    }
+
+    if (TT.getArch() == Triple::ArchType::hexagon) {
+      // These cause problems when the shift amount is non-constant.
+      setLibcallName(RTLIB::SHL_I128, nullptr);
+      setLibcallName(RTLIB::SRL_I128, nullptr);
+      setLibcallName(RTLIB::SRA_I128, nullptr);
+    }
+
+    if (TT.isLoongArch()) {
+      if (!TT.isLoongArch64()) {
+        // Set libcalls.
+        setLibcallName(RTLIB::MUL_I128, nullptr);
+        // The MULO libcall is not part of libgcc, only compiler-rt.
+        setLibcallName(RTLIB::MULO_I64, nullptr);
+      }
+      // The MULO libcall is not part of libgcc, only compiler-rt.
+      setLibcallName(RTLIB::MULO_I128, nullptr);
+    }
+
+    if (TT.isMIPS32()) {
+      // These libcalls are not available in 32-bit.
+      setLibcallName(RTLIB::SHL_I128, nullptr);
+      setLibcallName(RTLIB::SRL_I128, nullptr);
+      setLibcallName(RTLIB::SRA_I128, nullptr);
+      setLibcallName(RTLIB::MUL_I128, nullptr);
+      setLibcallName(RTLIB::MULO_I64, nullptr);
+      setLibcallName(RTLIB::MULO_I128, nullptr);
+    }
+
+    if (TT.isPPC()) {
+      if (!TT.isPPC64()) {
+        // These libcalls are not available in 32-bit.
+        setLibcallName(RTLIB::SHL_I128, nullptr);
+        setLibcallName(RTLIB::SRL_I128, nullptr);
+        setLibcallName(RTLIB::SRA_I128, nullptr);
+        setLibcallName(RTLIB::MUL_I128, nullptr);
+        setLibcallName(RTLIB::MULO_I64, nullptr);
+      }
+      setLibcallName(RTLIB::MULO_I128, nullptr);
+    }
+
+    if (TT.isRISCV32()) {
+      // These libcalls are not available in 32-bit.
+      setLibcallName(RTLIB::SHL_I128, nullptr);
+      setLibcallName(RTLIB::SRL_I128, nullptr);
+      setLibcallName(RTLIB::SRA_I128, nullptr);
+      setLibcallName(RTLIB::MUL_I128, nullptr);
+      setLibcallName(RTLIB::MULO_I64, nullptr);
+    }
+
+    if (TT.isSPARC()) {
+      if (!TT.isSPARC64()) {
+        // These libcalls are not available in 32-bit.
+        setLibcallName(RTLIB::MULO_I64, nullptr);
+        setLibcallName(RTLIB::MUL_I128, nullptr);
+        setLibcallName(RTLIB::SHL_I128, nullptr);
+        setLibcallName(RTLIB::SRL_I128, nullptr);
+        setLibcallName(RTLIB::SRA_I128, nullptr);
+      }
+      setLibcallName(RTLIB::MULO_I128, nullptr);
+    }
+
+    if (TT.isSystemZ()) {
+      setLibcallName(RTLIB::SRL_I128, nullptr);
+      setLibcallName(RTLIB::SHL_I128, nullptr);
+      setLibcallName(RTLIB::SRA_I128, nullptr);
+    }
+
+    if (TT.isX86()) {
+      if (TT.getArch() == Triple::ArchType::x86) {
+        // These libcalls are not available in 32-bit.
+        setLibcallName(RTLIB::SHL_I128, nullptr);
+        setLibcallName(RTLIB::SRL_I128, nullptr);
+        setLibcallName(RTLIB::SRA_I128, nullptr);
+        setLibcallName(RTLIB::MUL_I128, nullptr);
+        // The MULO libcall is not part of libgcc, only compiler-rt.
+        setLibcallName(RTLIB::MULO_I64, nullptr);
+      }
+
+      // The MULO libcall is not part of libgcc, only compiler-rt.
+      setLibcallName(RTLIB::MULO_I128, nullptr);
+
+      if (TT.isOSMSVCRT()) {
+        // MSVCRT doesn't have powi; fall back to pow
+        setLibcallName(RTLIB::POWI_F32, nullptr);
+        setLibcallName(RTLIB::POWI_F64, nullptr);
+      }
+    }
+  }
+};
+
 /// This base class for TargetLowering contains the SelectionDAG-independent
 /// parts that can be used from the rest of CodeGen.
 class TargetLoweringBase {
@@ -3410,44 +3840,40 @@ class TargetLoweringBase {
     return nullptr;
   }
 
-  //===--...
[truncated]

Copy link

github-actions bot commented Jul 11, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm/include/llvm/CodeGen/TargetLowering.h Outdated Show resolved Hide resolved
llvm/lib/LTO/LTO.cpp Outdated Show resolved Hide resolved
SmallVector<const char *> LibcallSymbols;
copy_if(Libcalls.getLibcallNames(), std::back_inserter(LibcallSymbols),
[](const char *Name) { return Name; });
return LibcallSymbols;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid making a copy of this giant vector every time it's used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, but in-use this is only ever called once and never re-used. The alternative would need some dense map of triples to vectors and I don't think it would add anything.

Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what you mean by "but cannot maintain the overhead
of several unused function calls"?

My understanding is that these symbols are still only included in the final binary if they are actually required/referenced. The handleLibcall mechanism just forces them to be included/available during LTO compilation, it doesn't actually force them to be included in the output binary, does it?

llvm/include/llvm/CodeGen/TargetLowering.h Outdated Show resolved Hide resolved
@jhuber6
Copy link
Contributor Author

jhuber6 commented Jul 11, 2024

Can you explain what you mean by "but cannot maintain the overhead of several unused function calls"?

Poorly worded, we don't want random functions that aren't used to sit around in the binary. If I do -lm it adds about 1.2 seconds of link time since it forces every single function to go through codegen. It also bloats the binary.

My understanding is that these symbols are still only included in the final binary if they are actually required/referenced. The handleLibcall mechanism just forces them to be included/available during LTO compilation, it doesn't actually force them to be included in the output binary, does it?

No, they are extracted unconditionally if they are found and not allowed to be internalized. That's what this patch is trying to solve.

@sbc100
Copy link
Collaborator

sbc100 commented Jul 11, 2024

No, they are extracted unconditionally if they are found and not allowed to be internalized. That's what this patch is trying to solve.

Sorry I'm still not quite sure that this means. Are you saying that they then take up space in the final output binary? Because that is not what we see in emscripten. There functions go through codegen but are then discarded by the linker since they are not referenced. I believe the are discarded during --gc-sections (which is on by default for wasm which is perhaps why we don't see the effect of these symbols in emscripten?)

@jhuber6
Copy link
Contributor Author

jhuber6 commented Jul 11, 2024

No, they are extracted unconditionally if they are found and not allowed to be internalized. That's what this patch is trying to solve.

Sorry I'm still not quite sure that this means. Are you saying that they then take up space in the final output binary? Because that is not what we see in emscripten. There functions go through codegen but are then discarded by the linker since they are not referenced. I believe the are discarded during --gc-sections (which is on by default for wasm which is perhaps why we don't see the effect of these symbols in emscripten?)

The logic I modify in this patch controls the symbols. In IRSymTab.cpp we have logic that states

if (PreservedSymbols.contains(Sym))
  Sym.used();

Then in LTO.cpp we have logic that prevents used symbols from being internalized.

The logic in LLD pretty much states.

if (sym.isLibcall)
  sym->extract();

So it will always be extracted from a static archive even if it's unused because it thinks the backend needs it.

Likely a combination of --ffunction-sections and -Wl,--gc-sections will remove the dead functions, but that doesn't stop lld from extracting it and forcing it through codegen, which adds over a second of link time as it stands if you include math definitions.

@sbc100
Copy link
Collaborator

sbc100 commented Jul 11, 2024

Likely a combination of --ffunction-sections and -Wl,--gc-sections will remove the dead functions, but that doesn't stop lld from extracting it and forcing it through codegen, which adds over a second of link time as it stands if you include math definitions.

Ok that would explain it. -ffunction-sections and -Wl,--gc-sections are both enabled by default for the Wasm target. But if there are link time savings due to less codegen that would still benefit us. Thanks.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Jul 11, 2024

Likely a combination of --ffunction-sections and -Wl,--gc-sections will remove the dead functions, but that doesn't stop lld from extracting it and forcing it through codegen, which adds over a second of link time as it stands if you include math definitions.

Ok that would explain it. -ffunction-sections and -Wl,--gc-sections are both enabled by default for the Wasm target. But if there are link time savings due to less codegen that would still benefit us. Thanks.

Yeah, it'll make a difference if you're trying to do LTO with a static library that contains libcalls, which is a little niche but important for my use-case. I also wonder if we should do this by default for AMDGPU / NVPTX as well, @arsenm WDYT?

if (!ctx.bitcodeFileInstances.empty())
for (auto *s : lto::LTO::getRuntimeLibcallSymbols())
if (!ctx.bitcodeFileInstances.empty()) {
llvm::Triple TT(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaskRay It's safe to assume that obj is always non-null at this point and that all IR files have the same triple, right?

@efriedma-quic
Copy link
Collaborator

So it will always be extracted from a static archive even if it's unused because it thinks the backend needs it.

If the function in the archive is compiled code, this should be cheap, so it doesn't really matter. If it's bitcode, LTO isn't really architected to handle it. I don't think we'd want to modify the way symbol lookup works. Maybe we can pull the bitcode files into the link, but delay actually compiling them until we know whether they're actually necessary. Or we can examine the bitcode inputs in more detail to determine which functions they can actually call.

Some examples of heuristics we could use:

  • query target features for whether some specific operation needs to be emulated
  • scan the module for declarations of math intrinsics

This patch doesn't address any of that, though; it just ensures that the linker agrees with the LLVM backend about which functions should actually be treated as "runtime functions" for a given triple (i.e. functions which can be called by the LLVM backend even if they aren't referenced in the IR), as opposed to using the same set for every triple.

aaryanshukla pushed a commit to aaryanshukla/llvm-project that referenced this pull request Jul 16, 2024
Summary:
The target information needs to configure that the platform has a
maximum integer size of 64 in order for it to enable i128 support. The
motivation behind this patch is that the i128 libcalls seem to be the
only ones used by the NVPTX backend and it would be ideal to disable
those completely. That would allow LTO to optimize libcalls properly
after llvm#98512.
@vitalybuka
Copy link
Collaborator

Is anyone looking at https://lab.llvm.org/buildbot/#/builders/66/builds/1669 ?

@jhuber6
Copy link
Contributor Author

jhuber6 commented Jul 16, 2024

Is anyone looking at https://lab.llvm.org/buildbot/#/builders/66/builds/1669 ?

Unsure if this is related... I guess you could try bisecting it.

@vitalybuka
Copy link
Collaborator

This happens compiling RuntimeLibcalls.cpp introduced in this patch.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Jul 16, 2024

This happens compiling RuntimeLibcalls.cpp introduced in this patch.

You're right, Maybe there's some configuration that causes ISDOpcodes to not be available?

@vitalybuka
Copy link
Collaborator

Note: this is not incremental bot, every time it removes build dir, so this is not incremental CMake issue.

I believe

DEPENDS
  vt_gen
  intrinsics_gen

I'll try and patch

vitalybuka added a commit that referenced this pull request Jul 16, 2024
Fixes 'llvm/CodeGen/GenVT.inc' file not found.

Follow up to #98512
@fhahn
Copy link
Contributor

fhahn commented Jul 16, 2024

@vitalybuka at least for me locally bb604ae fixed the issue, thanks!

sayhaan pushed a commit to sayhaan/llvm-project that referenced this pull request Jul 16, 2024
…#98512)

Summary:
The LTO pass and LLD linker have logic in them that forces extraction
and prevent internalization of needed runtime calls. However, these
currently take all RTLibcalls into account, even if the target does not
support them. The target opts-out of a libcall if it sets its name to
nullptr. This patch pulls this logic out into a class in the header so
that LTO / lld can use it to determine if a symbol actually needs to be
kept.

This is important for targets like AMDGPU that want to be able to use
`lld` to perform the final link step, but does not want the overhead of
uncalled functions. (This adds like a second to the link time trivially)

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D59822415
@jhuber6
Copy link
Contributor Author

jhuber6 commented Jul 16, 2024

Note: this is not incremental bot, every time it removes build dir, so this is not incremental CMake issue.

I believe

DEPENDS
  vt_gen
  intrinsics_gen

I'll try and patch

Good catch, appreciate the fix!

Copy link
Contributor

@chapuni chapuni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely layering violation.

#define LLVM_IR_RUNTIME_LIBCALLS_H

#include "llvm/ADT/ArrayRef.h"
#include "llvm/CodeGen/ISDOpcodes.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llvm/IR should not depend on llvm/CodeGen

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be unused here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see it's used for the cond code actions. Those should probably be in terms of the IR compare types

@jhuber6
Copy link
Contributor Author

jhuber6 commented Jul 17, 2024

Likely layering violation.

I thought about that, but because the parts we needed were header only I figured it was legal. It's needed for the default calling conventions, which we could move somewhere else if needed.

@chapuni
Copy link
Contributor

chapuni commented Jul 19, 2024

@jhuber6 I've filed #99610.

but because the parts we needed were header only I figured it was legal.

It shall be problematic for modularizing header units.

chapuni added a commit that referenced this pull request Jul 20, 2024
…ed (#98512)"

This reverts commit c05126b.
(llvmorg-19-init-17714-gc05126bdfc3b)
See #99610
jhuber6 added a commit that referenced this pull request Jul 20, 2024
…ped (#98512)"

This reverts commit 740161a.

I moved the `ISD` dependencies into the CodeGen portion of the handling,
it's a little awkward but it's the easiest solution I can think of for
now.
sgundapa pushed a commit to sgundapa/upstream_effort that referenced this pull request Jul 23, 2024
Fixes 'llvm/CodeGen/GenVT.inc' file not found.

Follow up to llvm#98512
sgundapa pushed a commit to sgundapa/upstream_effort that referenced this pull request Jul 23, 2024
…ed (llvm#98512)"

This reverts commit c05126b.
(llvmorg-19-init-17714-gc05126bdfc3b)
See llvm#99610
sgundapa pushed a commit to sgundapa/upstream_effort that referenced this pull request Jul 23, 2024
…ped (llvm#98512)"

This reverts commit 740161a.

I moved the `ISD` dependencies into the CodeGen portion of the handling,
it's a little awkward but it's the easiest solution I can think of for
now.
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
Summary:
The LTO pass and LLD linker have logic in them that forces extraction
and prevent internalization of needed runtime calls. However, these
currently take all RTLibcalls into account, even if the target does not
support them. The target opts-out of a libcall if it sets its name to
nullptr. This patch pulls this logic out into a class in the header so
that LTO / lld can use it to determine if a symbol actually needs to be
kept.

This is important for targets like AMDGPU that want to be able to use
`lld` to perform the final link step, but does not want the overhead of
uncalled functions. (This adds like a second to the link time trivially)

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60251770
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
Summary:
Fixes 'llvm/CodeGen/GenVT.inc' file not found.

Follow up to #98512

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60251678
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
…ed (#98512)"

This reverts commit c05126b.
(llvmorg-19-init-17714-gc05126bdfc3b)
See #99610
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
…ped (#98512)"

This reverts commit 740161a.

I moved the `ISD` dependencies into the CodeGen portion of the handling,
it's a little awkward but it's the easiest solution I can think of for
now.
cachemeifyoucan added a commit that referenced this pull request Sep 9, 2024
Fix a bug that `lto_runtime_lib_symbols_list` is returning the address
of a local variable that will be freed when getting out of scope. This
is a regression from #98512 that rewrites the runtime libcall function
lists into a SmallVector.

rdar://135559037
cachemeifyoucan added a commit to cachemeifyoucan/llvm-project that referenced this pull request Sep 9, 2024
Fix a bug that `lto_runtime_lib_symbols_list` is returning the address
of a local variable that will be freed when getting out of scope. This
is a regression from llvm#98512 that rewrites the runtime libcall function
lists into a SmallVector.

rdar://135559037
(cherry picked from commit 66e9078)
cachemeifyoucan added a commit to swiftlang/llvm-project that referenced this pull request Sep 9, 2024
Fix a bug that `lto_runtime_lib_symbols_list` is returning the address
of a local variable that will be freed when getting out of scope. This
is a regression from llvm#98512 that rewrites the runtime libcall function
lists into a SmallVector.

rdar://135559037
(cherry picked from commit 66e9078)
VitaNuo pushed a commit to VitaNuo/llvm-project that referenced this pull request Sep 12, 2024
Fix a bug that `lto_runtime_lib_symbols_list` is returning the address
of a local variable that will be freed when getting out of scope. This
is a regression from llvm#98512 that rewrites the runtime libcall function
lists into a SmallVector.

rdar://135559037
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants