How to achieve better build performance? #85

tetsuok · 2022-08-16T14:17:33Z

It seems build performance needs to be improved compared to the build performance using the host compiler. I realized the issue in a x86-64 Linux VM on GCP when I built C++ libraries (e.g., protobuf, gRPC) with this toolchain. It seems the build using gcc-toolchain takes longer than when building C++ libraries using the host compiler. The reason seems to be costs for sandbox setup and deletion for every compile action, as far as I investigated. I noticed this performance issue can be mitigated by adding the --experimental_reuse_sandbox_directories option to CLI or .bazelrc. Since that option is experimental, I'm wondering there is any way to improve the build performance using gcc-toolchain.

For the record, below is the result of performance profiling on that VM. I measured the build time of the //examples/protobuf:hello_world_proto target in this repo in three different settings:

build with host compiler (by turning off --action_env=BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN=1, --incompatible_strict_action_env=true, --incompatible_enable_cc_toolchain_resolution in .bazelrc)
build with gcc-toolchain
build with gcc-toolchain, enabling --experimental_reuse_sandbox_directories in .bazelrc.

Environment Information

$ uname -a
Linux ubuntu-0 5.15.0-1016-gcp #21~20.04.1-Ubuntu SMP Fri Aug 5 12:53:07 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/9/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:hsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 9.4.0-1ubuntu1~20.04.1' --with-bugurl=file:///usr/share/doc/gcc-9/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-9 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-9-Av3uEd/gcc-9-9.4.0/debian/tmp-nvptx/usr,hsa --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
$ bazel version
Build label: 5.2.0
Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Tue Jun 7 16:02:26 2022 (1654617746)
Build timestamp: 1654617746
Build timestamp as int: 1654617746

Repro steps

git clone https://github.com/aspect-build/gcc-toolchain.git
cd gcc-toolchain
git checkout 381975950d0909e1a1608c8a90858536562e4b1d # latest as of 2022-08-16
bazel build --profile=/tmp/prof //examples/protobuf:hello_world_proto
bazel analyze-profile /tmp/prof

Results

The following table shows the outputs of the above bazel analyze-profile /tmp/prof command for three settings. Here are few observations:

Total run time of gcc-toolchain takes 1.7 times longer than host compiler
gcc-toolchain spends more time (1.8 times longer) on execution phase than host compiler.
Reusing sandbox helps. Total run time of gcc-toolchain w/ reuse sandbox is close to that of host compiler.

	host compiler	gcc-toolchain	gcc-toolchain w/ reuse sandbox
Total launch phase time	1.418 s (0.68%)	1.415 s (0.40%)	1.308 s (0.57%)
Total init phase time	0.671 s (0.32%)	0.673 s (0.19%)	0.748 s (0.33%)
Total target pattern evaluation phase time	1.817 s (0.87%)	5.008 s (1.40%)	1.821 s (0.80%)
Total interleaved loading-and-analysis phase time	71.166 s (34.11%)	107.429 s (30.10%)	71.944 s (31.43%
Total preparation phase time	0.026 s (0.01%)	0.017 s (0.00%)	0.025 s (0.01%)
Total execution phase time	133.468 s (63.98%)	242.359 s (67.90%)	152.983 s (66.84%)
Total finish phase time	0.045 s (0.02%)	0.054 s (0.02%)	0.045 s (0.02%)
Total run time	208.613 s (100.00%)	356.958 s (100.00%)	228.876 s (100.00%)

Tracing

Analyzed collected files by bazel analyze-profile in chrome://tracing for each setting, following https://bazel.build/rules/performance. The following screenshots show the results (execution phase) of gcc-toolchain and gcc-toolchain w/ reuse sandbox, respectively (corrected order of the screenshots).

Observations

CPU usage (total) of gcc-toolchain is "jagged" compared to that of gcc-toolchain w/ reuse sandbox which keeps CPU busy.
When CPU usage of gcc-toolchain goes down, Bazel spends on sandbox creation and deletion, showing green colored boxes before and after purple colored "subprocess.run" box.

The text was updated successfully, but these errors were encountered:

f0rmiga · 2022-08-18T20:33:24Z

Thank you very much for your detailed report.

The first thing that comes to my mind after reading it is that the host compiler won't use the sysroot of this repo. The sysroot by itself is more than 7k tiny files, so there's an extra cost to setup the sandbox for them, which I believe you captured well in your debugging.

The only thing I can do on our side (other than suggest --experimental_reuse_sandbox_directories) is to prune the sysroot more but not enough that the times would be comparable to disabling sandbox or setting --experimental_reuse_sandbox_directories.

alexeagle · 2022-08-18T20:34:23Z

From a bazel aquery on a single-file cc_library I came up with this list of action inputs:
https://gist.github.com/alexeagle/4158dc78ac0071fb140683b50ab034aa

f0rmiga · 2022-08-18T20:35:34Z

Another thing we can do is to ditch the bootlin files completely and rely on the binaries we already build to create the sysroots. I have some work in progress for this locally already.

alexeagle · 2022-08-18T20:56:14Z

Ultimately it's the same Bazel problem as the nodejs ecosystem has reported for a long time, where thousands of inputs are slow to sandbox. bazelbuild/bazel#8230 is one issue I've followed.

It feels to me like the ideal solution would be for tools like gcc and clang to be able to use an archive file as a sysroot, then read inside that archive as a virtual filesystem. That decreases it to a single input from Bazel's perspective.

Another option here is to turn off sandboxing for the typical developer workflow, and do something less frequent to ensure that all srcs/deps are actually declared.

tetsuok · 2022-08-19T12:20:08Z

Thanks for the detailed comments! I appreciate that.

It feels to me like the ideal solution would be for tools like gcc and clang to be able to use an archive file as a sysroot, then read inside that archive as a virtual filesystem. That decreases it to a single input from Bazel's perspective.

I think this sounds great because projects that build LLVM from source with Bazel can benefit from that. But I'm not sure how much effort is needed to implement the functionality inside Bazel. It seems FUSE is necessary to implement that.
(I had a hope that sandboxfs is a solution to sandbox performance issues, but it seems not actively developed.)

tetsuok · 2022-08-19T12:22:56Z

By the way, when I built the same target with --sandbox_base=/dev/shm, the runtime was similar to that of --experimental_reuse_sandbox_directories, as shown below.
(I accidentally found that option when I looked at Google's project, https://github.com/google/crubit/blob/main/.bazelrc#L12, which builds LLVM from source with Bazel)

=== PHASE SUMMARY INFORMATION ===

Total launch phase time         1.409 s    0.61%
Total init phase time           0.752 s    0.32%
Total target pattern evaluation phase time    1.743 s    0.75%
Total interleaved loading-and-analysis phase time   74.668 s   32.23%
Total preparation phase time    0.021 s    0.01%
Total execution phase time    153.050 s   66.06%
Total finish phase time         0.040 s    0.02%
------------------------------------------------
Total run time                231.685 s  100.00%

Critical path (22.693 s):
       Time Percentage   Description
     132 ms    0.58%   action 'Executing genrule @zlib// copy_public_headers'
    0.18 ms    0.00%   BazelCppSemantics_build_arch_k8-opt-exec-2B5CBBC6 for @zlib// zlib
    0.17 ms    0.00%   BazelCppSemantics_build_arch_k8-opt-exec-2B5CBBC6 for @com_google_protobuf// protobuf
   21.891 s   96.47%   action 'Compiling src/google/protobuf/descriptor.cc'
    97.1 ms    0.43%   action 'Linking external/com_google_protobuf/libprotobuf.a'
     560 ms    2.47%   action 'Linking external/com_google_protobuf/protoc'
    0.23 ms    0.00%   runfiles for @com_google_protobuf// protoc
    12.4 ms    0.05%   action 'Generating Descriptor Set proto_library //examples/protobuf hello_world_proto'

f0rmiga · 2022-08-19T19:18:28Z

Using tmpfs will always be a good performance improvement but at the cost of your precious machine memory. Depending on how much memory you have to spend, that may be a suitable alternative, but I don't think a broader community should follow this path or that we should advise on it. Especially when calculating cloud costs to run CI agents, more yet on large projects, tmpfs may be totally unfeasible. Having said that, --experimental_reuse_sandbox_directories may be better.

alexeagle · 2022-08-19T20:54:09Z

Created that linked issue to see if someone from the Bazel team can explain why sandbox reuse is still experimental.

tetsuok · 2022-08-20T01:38:50Z

@f0rmiga Thanks for the explanation. That really makes sense to me. I agree using '--experimental_reuse_sandbox_directories'.

tetsuok · 2022-08-20T01:39:43Z

@alexeagle Thanks for creating the issue!

f0rmiga · 2022-10-04T20:36:11Z

According to bazelbuild/bazel#16138 (comment), experimental_reuse_sandbox_directories could be promoted to stable, meaning it's probably safe to use it.

@tetsuok please reopen if you want to discuss any further.

alexeagle mentioned this issue Aug 19, 2022

Promote experimental_reuse_sandbox_directories to stable bazelbuild/bazel#16138

Closed

cgrindel added question This issue is a question. Close the loop with documentation? need: discussion labels Sep 28, 2022

f0rmiga closed this as completed Oct 4, 2022

f0rmiga removed question This issue is a question. Close the loop with documentation? need: discussion labels Oct 4, 2022

michael-christen mentioned this issue Aug 18, 2024

rust_* is broken: Pigweed's Usage of Clang Breaks rust_library usage michael-christen/toolbox#90

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to achieve better build performance? #85

How to achieve better build performance? #85

tetsuok commented Aug 16, 2022 •

edited

Loading

f0rmiga commented Aug 18, 2022

alexeagle commented Aug 18, 2022

f0rmiga commented Aug 18, 2022

alexeagle commented Aug 18, 2022

tetsuok commented Aug 19, 2022

tetsuok commented Aug 19, 2022 •

edited

Loading

f0rmiga commented Aug 19, 2022 •

edited

Loading

alexeagle commented Aug 19, 2022

tetsuok commented Aug 20, 2022

tetsuok commented Aug 20, 2022

f0rmiga commented Oct 4, 2022

How to achieve better build performance? #85

How to achieve better build performance? #85

Comments

tetsuok commented Aug 16, 2022 • edited Loading

Repro steps

Results

Tracing

f0rmiga commented Aug 18, 2022

alexeagle commented Aug 18, 2022

f0rmiga commented Aug 18, 2022

alexeagle commented Aug 18, 2022

tetsuok commented Aug 19, 2022

tetsuok commented Aug 19, 2022 • edited Loading

f0rmiga commented Aug 19, 2022 • edited Loading

alexeagle commented Aug 19, 2022

tetsuok commented Aug 20, 2022

tetsuok commented Aug 20, 2022

f0rmiga commented Oct 4, 2022

tetsuok commented Aug 16, 2022 •

edited

Loading

tetsuok commented Aug 19, 2022 •

edited

Loading

f0rmiga commented Aug 19, 2022 •

edited

Loading