Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llvmPackages_*.clang: bind linker on compiling clang #220595

Draft
wants to merge 1 commit into
base: staging
Choose a base branch
from

Conversation

SharzyL
Copy link
Contributor

@SharzyL SharzyL commented Mar 11, 2023

When using clang for cross compilation, clang cannot find the linker.

Example (works for clang 12 and later):

$ nix build nixpkgs#pkgsCross.aarch64-multiplatform.buildPackages.llvmPackages_14.clang
$ ./result/bin/aarch64-unknown-linux-gnu-cc foo.c
clang-14: error: unable to execute command: Executable "ld" doesn't exist!
clang-14: error: linker command failed with exit code 1 (use -v to see invocation)

Let me explain why this happens. For consistency with gcc, given a linker name (ld), clang finds the linker in the following order:

  1. in prefix dirs, which is specified with COMPILER_PATH environment or -B command line args.
  2. in ProgramPaths, which is toolchain specific.
  3. in PATH environment.

The step 2 and 3 considers all possible target triple prefixes, but the step 1 does not. In nixpkgs, all binaries in cross toolchains are prefixed by the target triple, thus clang cannot find them.

Related LLVM source: https://github.com/llvm/llvm-project/blob/8dfdcc7b7bf66834a761bd8de445840ef68e4d1a/clang/lib/Driver/Driver.cpp#L5857-L5897

Related LLVM commit: llvm/llvm-project@3452a0d

Related gcc doc: https://gcc.gnu.org/onlinedocs/gccint/Collect2.html

For gcc in nixpkgs, the linker path is hardcoded in configuration phase (as is shown below). This commit copies this behavior to clang.

"--with-ld=${targetPackages.stdenv.cc.bintools}/bin/${targetPlatform.config}-ld"

Description of changes
Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.05 Release Notes (or backporting 22.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

@SharzyL SharzyL requested a review from matthewbauer as a code owner March 11, 2023 03:28
@SharzyL
Copy link
Contributor Author

SharzyL commented Mar 11, 2023

seems like a loop occurs, let me find out how to fix

@rrbutani rrbutani self-requested a review March 26, 2023 17:36
@SharzyL SharzyL requested a review from RaitoBezarius as a code owner May 3, 2023 04:52
@SharzyL SharzyL force-pushed the clang_bind_ld branch 2 times, most recently from 18dd5e1 to 442b21a Compare May 3, 2023 05:15
@SharzyL
Copy link
Contributor Author

SharzyL commented May 3, 2023

The infinite recursion is caused by wasi libc introduced by firefox. But I find it hard to figure out how the recursion occurs. Is there someone familier with bootstrapping to offer a help?

@SharzyL SharzyL marked this pull request as draft May 3, 2023 08:28
@NickCao NickCao requested a review from trofi May 3, 2023 11:21
@trofi
Copy link
Contributor

trofi commented May 3, 2023

$ nix build -f. pkgsCross.wasi32.stdenv.cc.bintools --show-trace |& fgrep attribute

       … while evaluating the attribute 'llvmPackages.clangUseLLVM'
         whose name attribute is located at /home/slyfox/dev/git/nixpkgs-master/pkgs/stdenv/generic/make-derivation.nix:303:7
       … while evaluating attribute 'stdenv' of derivation 'wasilibc-static-wasm32-unknown-wasi-19'
         whose name attribute is located at /home/slyfox/dev/git/nixpkgs-master/pkgs/stdenv/generic/default.nix:93:14
       … while evaluating attribute 'defaultNativeBuildInputs' of derivation 'stdenv-linux'
         whose name attribute is located at /home/slyfox/dev/git/nixpkgs-master/pkgs/stdenv/generic/make-derivation.nix:303:7
       … while evaluating attribute 'cc' of derivation 'wasm32-unknown-wasi-clang-wrapper-12.0.1'
         whose name attribute is located at /home/slyfox/dev/git/nixpkgs-master/pkgs/stdenv/generic/make-derivation.nix:303:7
       … while evaluating attribute 'cmakeFlags' of derivation 'clang-12.0.1'

wasi32 is a useLLVM = true target.

It's a recursion in bintools (llvm) <-> clang toolchain (also llvm). pkgs/development/compilers/llvm/12/default.nix has a bit of logic around untangling interdependencies between the two. I don't pretend to understand it. Maybe somebody from llvm team knows.

@SharzyL
Copy link
Contributor Author

SharzyL commented Oct 31, 2023

It has been a while. Now I finally figured out how the circular dependency occurs.

Recall that in our original implementation, we imitate gcc

"--with-ld=${targetPackages.stdenv.cc.bintools}/bin/${targetPlatform.config}-ld"

and add the following flags to clang

diff --git a/pkgs/development/compilers/llvm/12/clang/default.nix b/pkgs/development/compilers/llvm/12/clang/default.nix
index 7ecd4efc0837..d8f6e50b348c 100644
--- a/pkgs/development/compilers/llvm/12/clang/default.nix
+++ b/pkgs/development/compilers/llvm/12/clang/default.nix
@@ -2,6 +2,7 @@
 , buildLlvmTools
 , fixDarwinDylibNames
 , enableManpages ? false
+, targetPackages
 }:

 let
@@ -40,6 +41,8 @@ let
     ] ++ lib.optionals (stdenv.hostPlatform != stdenv.buildPlatform) [
       "-DLLVM_TABLEGEN_EXE=${buildLlvmTools.llvm}/bin/llvm-tblgen"
       "-DCLANG_TABLEGEN=${buildLlvmTools.libclang.dev}/bin/clang-tblgen"
+    ] ++ lib.optionals (stdenv.hostPlatform != stdenv.targetPlatform) [
+      "-DCLANG_DEFAULT_LINKER=${targetPackages.stdenv.cc.bintools}/bin/${stdenv.targetPlatform.config}-ld"
     ];

     patches = [

This change introduced two circles:

Circle 1

  1. pkgsCross.wasi32.wasilibc

  2. depends on pkgsCross.wasi32.buildPackages.llvmPackages.clangNoLibc

wasilibc is built with crossLibcStdenv (ref), whose compiler is builcPackages.llvmPackages.clangNoLibc for wasi32 (ref)

  1. depends on pkgsCross.wasi32.buildPackages.llvmPackages.tools.clang-unwrapped

  2. depends on pkgsCross.wasi32.(buildPackages.targetPackages.)stdenv.cc.bintools

via our newly added cmakeFlags

  1. depends on pkgsCross.wasi32.wasilibc (bintools-wrapper)

pkgsCross.wasi32.stdenv.cc is pkgsCross.wasi32.buildPackages.llvmPackages.clang (ref), which is pkgsCross.wasi32.buildPackages.llvmPackages.clangUseLLVM (ref).

Hence pkgsCross.wasi32.stdenv.cc.bintools is pkgsCross.wasi32.buildPackages.llvmPackages.bintools,
which is a wrapper created by wrapBintoolsWith, introducing pkgsCross.wasi32.buildPackages.libcCross (ref),
which is pkgsCross.wasi.(buildPackages.targetPackages.)wasilibc (ref)

How does gcc avoid this circle? The magic occurs in the last dependency step. For gcc-based pkgsCross, libcCross is glibcCross (ref), built with gccCrossLibcStdenv.cc (ref), whose compiler is gccWithoutTargetLibc (ref). gccWithoutTargetLibc is built with override targetPackages.stdenv.cc.bintools = binutilsNoLibc (ref), thus not depending on target libc.

Now we do the same thing to clang, we want pkgsCross.wasi32.wasilibc not depending on a bintools containing libc. Since pkgsCross.wasi32.wasilibc is built with pkgsCross.wasi32.buildPackages.clangNoLibc, we use the following overrides to cut the circle on dependency 4:

diff --git a/pkgs/development/compilers/llvm/12/default.nix b/pkgs/development/compilers/llvm/12/default.nix
index b976dd2ee67a..81bed7d9db96 100644
--- a/pkgs/development/compilers/llvm/12/default.nix
+++ b/pkgs/development/compilers/llvm/12/default.nix
@@ -210,7 +210,9 @@ let
     };

     clangNoLibc = wrapCCWith rec {
-      cc = tools.clang-unwrapped;
+      cc = tools.clang-unwrapped.override {
+        targetPackages.stdenv.cc.bintools = bintoolsNoLibc';
+      };
       libcxx = null;
       bintools = bintoolsNoLibc';
       extraPackages = [
@@ -223,7 +225,9 @@ let
     };

Circle 2

But the story is not coming to the end. There is another circle:

  1. pkgsCross.wasi32.wasilibc
  2. depends on pkgsCross.wasi32.buildPackages.llvmPackages.clangNoLibc (same as circle 1)
  3. depends on pkgsCross.wasi32.llvmPackages.compiler-rt (ref)
  4. depends on pkgsCross.wasi32.buildPackages.llvmPackages.clangNoCompilerRt (ref)
  5. depends on pkgsCross.wasi32.buildPackages.llvmPackages.clang-unwrapped (ref)
  6. depends on pkgsCross.wasi32.(buildPackages.targetPackages.)stdenv.cc.bintools (via our newly added cmakeFlags)
  7. … (follow the steps 4 and 5 in circle 1)
  8. depends on pkgsCross.wasi32.wasilibc

Luckily similar solution works:

diff --git a/pkgs/development/compilers/llvm/12/default.nix b/pkgs/development/compilers/llvm/12/default.nix
index b976dd2ee67a..81bed7d9db96 100644
--- a/pkgs/development/compilers/llvm/12/default.nix
+++ b/pkgs/development/compilers/llvm/12/default.nix
@@ -223,7 +225,9 @@ let
     };

     clangNoCompilerRt = wrapCCWith rec {
-      cc = tools.clang-unwrapped;
+      cc = tools.clang-unwrapped.override {
+        targetPackages.stdenv.cc.bintools = bintoolsNoLibc';
+      };
       libcxx = null;
       bintools = bintoolsNoLibc';
       extraPackages = [ ];

@SharzyL
Copy link
Contributor Author

SharzyL commented Oct 31, 2023

Now the circle occurs in darwin. Yet another odyssey to figure out why.

@RaitoBezarius
Copy link
Member

Thank you for this amazing PR, I was going to embark on the journey too.
cc @reckenrode for the Darwin bits.

@RaitoBezarius RaitoBezarius requested review from Ericson2314 and alyssais and removed request for trofi October 31, 2023 17:00
@RaitoBezarius RaitoBezarius assigned ghost and unassigned ghost Oct 31, 2023
@RaitoBezarius RaitoBezarius requested a review from a user October 31, 2023 17:01
@SharzyL
Copy link
Contributor Author

SharzyL commented Oct 31, 2023

To reproduce the error locally:

nix eval .#legacyPackages.aarch64-darwin.nixStatic --show-trace

@reckenrode
Copy link
Contributor

reckenrode commented Oct 31, 2023

Now the circle occurs in darwin. Yet another odyssey to figure out why.

You may want to look at #256590. Darwin cross is messy because the source-based SDK is prone infinite recursions if you’re not careful. The stdenv bootstrap is able to do that, but cross-compilation doesn’t allow that level of control.

#260543 is another PR that fixes cross-compilation of LLVM itself, which I found was necessary for building some things after I got x86_64-darwin cross-compilation working. The merge conflict is probably due to #260963, which made curl build like a normal package on Darwin, but it required stdenv bootstrap changes to break more infinite recursions.

I was planning to fix #256590 after this staging-next (where my focus is on getting the default LLVM update through), but I can fix the merge conflict and re-push the PR if that would be helpful or there’s interest in getting it into staging for the next staging-next cycle.

Comment on lines +41 to +42
] ++ lib.optionals (stdenv.hostPlatform != stdenv.targetPlatform) [
"-DCLANG_DEFAULT_LINKER=${targetPackages.stdenv.cc.bintools}/bin/${stdenv.targetPlatform.config}-ld"
Copy link

@ghost ghost Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure targetPackages is the right thing to use here?

targetPackages means pkgsTargetTarget, which is the packages which run on the targetPlatform and emit binaries for the targetPlatform.

When you're building a (build==host)!=target compiler, its linker will run on the hostPlatform

But I don't know clang very well so maybe I am misreading the meaning of -DCLANG_DEFAULT_LINKER. @Ericson2314 would know for sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I agree with @amjoseph-nixpkgs here.

I would expect rather pkgsBuildTarget to be used here, but maybe there's some splicing surprises… ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might look suprising. Yes targetPackages is the packages that run on targetPlatform and emit binaries for the targetPlatform, but targetPackages.stdenv is something else. It is the environment that produces targetPackages. Now that targetPackages is the cross packages, targetPackages.stdenv.cc.bintools is the bintools that runs on host platform and emit target platform binaries.

@ghost
Copy link

ghost commented Nov 2, 2023

Holy smokes these treewide operations on clang are hard to review... the same diff repeated 12 times, and you just sorta cross your fingers and hope they aren't slightly different from each other. Y'all really should consider pulling a #249707 on clang.

@RaitoBezarius
Copy link
Member

Holy smokes these treewide operations on clang are hard to review... the same diff repeated 12 times, and you just sorta cross your fingers and hope they aren't slightly different from each other. Y'all really should consider pulling a #249707 on clang.

@rrbutani tried… It was hard because we need to unify all changes first :(

@reckenrode
Copy link
Contributor

@Artturin has been working on #253671 to consolidate the LLVM derivations.

When using clang for cross compilation, clang cannot find the linker.
Because, for consistency with gcc, given a linker name, clang finds the
linker in the following order:

1. in prefix dirs, which contains toolchain specific paths and paths
   specified by `-B`
2. in ProgramPaths
3. in PATH environment

(refer to `Driver::GetProgramPath` in `clang/lib/Driver/Driver.cpp`)
(also refer to https://gcc.gnu.org/onlinedocs/gccint/Collect2.html)

The step 2 and 3 considers all possible target triple prefixes, but
the step 1 does not. In nixpkgs, all binaries in cross toolchains are
prefixed by the target triple, thus clang cannot find them.

For gcc in nixpkgs, the linker path is hardcoded in configuration
phase. This commit copies this behavior to clang.

But the added cmakeFlags also introduce circular dependency. The
circle comes from the fact that clangNoLibc and clangNoCompilerRT
introduce dependency of libc unexpectedly via cmakeFlags of their
underlying clang. We need to override bintools for them. Refer to

NixOS#220595 (comment)

for details.
@SharzyL
Copy link
Contributor Author

SharzyL commented Nov 7, 2023

Added fixes for darwin packages. Let's see if ofborg likes it.

@wegank wegank added the 2.status: merge conflict This PR has merge conflicts with the target branch label Mar 20, 2024
@rrbutani rrbutani added the 6.topic: llvm/clang Issues related to llvmPackages, clangStdenv and related label May 27, 2024
@wegank wegank added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jul 4, 2024
@pwaller
Copy link
Contributor

pwaller commented Nov 8, 2024

Sorry I'm late for the party, but I want to advocate for rejecting this PR.

Rationale:

  • Clang is a cross compiler by default.
  • Clang is expensive to rebuild (in compute and storage).
  • Compiler wrappers are exceedingly cheap (instant to rebuild and negligible in storage).

Therefore, it's so much better to handle this in a wrapper, we should do so even if it means a little bit of pain for some use cases. Note that I'm not saying other use cases should be harder: it's just that they probably need to be enabled in a different way.

One observation is that developing the alternative way should be much easier, since it will only involve a rebuild of the compiler wrapper, and not the compiler.

Another observation is that if you do this right, you never have to rebuild clang, because you can simply use one which is already a cache hit. The effect is that you can get to cross compiling your desired workload much faster.

See for example a case where I eliminated a rebuild of clang, enabling reuse of the one in cache.nixos.org: #351685

There has also been some discussion about tweaking LLVM_TARGETS_TO_BUILD so that we only build the target platform in. This would lead to smaller clang builds, which is great. If we do this, I hope we could also find a way to have an llvmPackagesWithAllTargets, and use this one for cross compilation, so that only a single extra thing needs to be built/cached to enable all cross compilation for a given buildPlatform.

@SharzyL
Copy link
Contributor Author

SharzyL commented Nov 8, 2024

I agree that the method in this PR maybe came to a deadend. I found it even more difficult to eliminate the cyclic dependency in the darwin toolchain. That is also why this PR has been stagnating for so long. The initial idea is to make clang work the same way as gcc, which is exactly how this PR (not) works. Modifying cc-wrapper might be more feasible. But the idea of modifying cc-wrapper also urges us to rethink whether gcc is doing in the right way. Should we also change the way gcc works?

Anyway I do not think that rebuilding clang is a major problem because both modifying clang and modifying cc-wrapper will trigger a mass rebuild in nixpkgs.

@pwaller
Copy link
Contributor

pwaller commented Nov 8, 2024

gcc, unlike clang, is not a cross compiler by default. So it's already rebuilding if you need a different target. I don't see the sense of trying to make these have an equivalent implementation.

By "Avoiding clang rebuilds", I don't mean over time, I mean, between targets. So, if I want to cross compile to an exotic target in an exotic configuration, if you have things like this in the description of the clang build, it is necessary to rebuild clang before you even start. If you just have to rebuild a clang wrapper, it is instant. So the user experience for building for their exotic target is substantially better, and it means they can use an existing compiler build from cache.nixos.org.

And it's not just the rebuild cost, but it's also the storage cost. If I'm building for multiple targets, I'd much rather have one clang build than many lying around.

@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Nov 8, 2024
@reckenrode
Copy link
Contributor

reckenrode commented Nov 8, 2024

Darwin should be a lot easier to handle since #346043 was merged, but I would also advocate for rejecting this PR. As part of that work, I made Darwin mostly build LLVM using the LLVM bootstrap. Once @emilazy’s and @paparodeo’s work to increase the minimum SDK and update Darwin to clang 19 lands, I plan to finish that work on the LLVM bootstrap. Darwin will build LLVM using LLVM bintools including the linker (LLD). Having to build clang again to use ld64 for the rest of the Darwin bootstrap doesn’t make sense.

One change I would support (because it affects Darwin cross-compilation) is making clang try <target-prefix>-ld first before trying ld unprefixed, especially when -target is specified. That would allow 3e5acda to be reverted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.status: merge conflict This PR has merge conflicts with the target branch 6.topic: llvm/clang Issues related to llvmPackages, clangStdenv and related 10.rebuild-darwin: 1-10 10.rebuild-darwin: 1 10.rebuild-linux: 11-100
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants