To do cross-compiling, "toolchain" has to be used.
Typical toolchain contains:
- header files for the target platform for "default" libraries like libc, c++ standard library, etc;
- header files for the libraries related to compiler builtins (known as
compiler-rt
orlibgcc
, sometimes including a library for exception handling support); - binaries
.a
,.so
and similar for "default" libraries; - binaries for "startfiles" like
crt1.o
that contain entry point, initialization and deinitialization routines relevant to the libc; - the binaries of cross-compiler, cross-linker, cross-assembler,
ar
andranlib
and possibly otherbinutils
- the binaries that run on host platform but generates artifacts for the target platform;
Toolchain is usually distributed as a tarball and is quite large, in order of hundreds MB. It contains an amalgamation of tools, libraries and binaries for all the needs: C, C++, Fortran, CUDA...
We don't really need all of this amalgamation for the following reason:
- we don't need cross-compiler and other tools, because we use LLVM infrastructure (clang, lld, llvm-ar, ...) and it supports cross-compilation by default;
- we don't need C++ headers and libraries because we include libc++, libc++abi, LLVM's libunwind as a source code and compile it from sources during build process;
- we definitely don't need Fortran headers;
The idea is to strip down the "toolchain" as much as possible and provide it as a submodule instead of tarball. Actually it's not longer a "toolchain", it's just a collection of libc-related libraries and a few files for compiler builtins.
This gives us the following advantages:
- more easy to add new platforms (no need to search for complete toolchain, just copy the relevant files from the OS image);
- better understanding what's going on - only the relevant files included;
- avoid risks of supply-chain attacks;
- allow to use custom sysroot even for default (non-cross) build to get reproducible, hermetic builds;
- opens up for experiment of building the libc from sources;
- simplify using musl-libc instead of glibc.
This repository contains some blobs like libc.so
.
The source:
-
for
x86_64
they are from Ubuntu 20.04 image; -
for
aarch64
they are from developer.arm.com -
for
s390x
it is extracted from Docker image:
docker run -it s390x/ubuntu:18.04
apt update
apt install gcc
docker export b38a367a8a05 > s390x.tar
- for
powerpc64le
it is extracted from Docker image:
docker run -it ppc64le/ubuntu:14.04
apt update
apt install gcc
docker export b38a367a8a05 > ppc64.tar
The ubuntu version 14.04 is selected for better compatibility.
- for
x86_64-musl
some headers and libraries come from the ubuntu image, others are built from our musl fork: github.com/ClickHouse/musl (see #28) - for
riscv
they are from Debian Unstable libc6-dev package. - for
loongarch64
they are from Debian Unstable image
FreeBSD:
https://clickhouse-datasets.s3.yandex.net/toolchains/toolchains/freebsd-11.3-toolchain.tar.xz
http://distcache.FreeBSD.org/local-distfiles/mikael/freebsd-12.2-aarch64-toolchain.tar.xz
TODO:
- build
compiler-rt
from sources and removelibgcc.a
from here; - simplify directory structure even more.