Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to achieve better build performance? #85

Closed
tetsuok opened this issue Aug 16, 2022 · 11 comments
Closed

How to achieve better build performance? #85

tetsuok opened this issue Aug 16, 2022 · 11 comments

Comments

@tetsuok
Copy link

tetsuok commented Aug 16, 2022

It seems build performance needs to be improved compared to the build performance using the host compiler. I realized the issue in a x86-64 Linux VM on GCP when I built C++ libraries (e.g., protobuf, gRPC) with this toolchain. It seems the build using gcc-toolchain takes longer than when building C++ libraries using the host compiler. The reason seems to be costs for sandbox setup and deletion for every compile action, as far as I investigated. I noticed this performance issue can be mitigated by adding the --experimental_reuse_sandbox_directories option to CLI or .bazelrc. Since that option is experimental, I'm wondering there is any way to improve the build performance using gcc-toolchain.

For the record, below is the result of performance profiling on that VM. I measured the build time of the //examples/protobuf:hello_world_proto target in this repo in three different settings:

  1. build with host compiler (by turning off --action_env=BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN=1, --incompatible_strict_action_env=true, --incompatible_enable_cc_toolchain_resolution in .bazelrc)
  2. build with gcc-toolchain
  3. build with gcc-toolchain, enabling --experimental_reuse_sandbox_directories in .bazelrc.
Environment Information
$ uname -a
Linux ubuntu-0 5.15.0-1016-gcp #21~20.04.1-Ubuntu SMP Fri Aug 5 12:53:07 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/9/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:hsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 9.4.0-1ubuntu1~20.04.1' --with-bugurl=file:///usr/share/doc/gcc-9/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-9 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-9-Av3uEd/gcc-9-9.4.0/debian/tmp-nvptx/usr,hsa --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
$ bazel version
Build label: 5.2.0
Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Tue Jun 7 16:02:26 2022 (1654617746)
Build timestamp: 1654617746
Build timestamp as int: 1654617746

Repro steps

git clone https://github.com/aspect-build/gcc-toolchain.git
cd gcc-toolchain
git checkout 381975950d0909e1a1608c8a90858536562e4b1d # latest as of 2022-08-16
bazel build --profile=/tmp/prof //examples/protobuf:hello_world_proto
bazel analyze-profile /tmp/prof

Results

The following table shows the outputs of the above bazel analyze-profile /tmp/prof command for three settings. Here are few observations:

  • Total run time of gcc-toolchain takes 1.7 times longer than host compiler
  • gcc-toolchain spends more time (1.8 times longer) on execution phase than host compiler.
  • Reusing sandbox helps. Total run time of gcc-toolchain w/ reuse sandbox is close to that of host compiler.
host compiler gcc-toolchain gcc-toolchain w/ reuse sandbox
Total launch phase time 1.418 s (0.68%) 1.415 s (0.40%) 1.308 s (0.57%)
Total init phase time 0.671 s (0.32%) 0.673 s (0.19%) 0.748 s (0.33%)
Total target pattern evaluation phase time 1.817 s (0.87%) 5.008 s (1.40%) 1.821 s (0.80%)
Total interleaved loading-and-analysis phase time 71.166 s (34.11%) 107.429 s (30.10%) 71.944 s (31.43%
Total preparation phase time 0.026 s (0.01%) 0.017 s (0.00%) 0.025 s (0.01%)
Total execution phase time 133.468 s (63.98%) 242.359 s (67.90%) 152.983 s (66.84%)
Total finish phase time 0.045 s (0.02%) 0.054 s (0.02%) 0.045 s (0.02%)
Total run time 208.613 s (100.00%) 356.958 s (100.00%) 228.876 s (100.00%)

Tracing

Analyzed collected files by bazel analyze-profile in chrome://tracing for each setting, following https://bazel.build/rules/performance. The following screenshots show the results (execution phase) of gcc-toolchain and gcc-toolchain w/ reuse sandbox, respectively (corrected order of the screenshots).

default

reuse_sandbox

Observations

  • CPU usage (total) of gcc-toolchain is "jagged" compared to that of gcc-toolchain w/ reuse sandbox which keeps CPU busy.
  • When CPU usage of gcc-toolchain goes down, Bazel spends on sandbox creation and deletion, showing green colored boxes before and after purple colored "subprocess.run" box.
@f0rmiga
Copy link
Owner

f0rmiga commented Aug 18, 2022

Thank you very much for your detailed report.

The first thing that comes to my mind after reading it is that the host compiler won't use the sysroot of this repo. The sysroot by itself is more than 7k tiny files, so there's an extra cost to setup the sandbox for them, which I believe you captured well in your debugging.

The only thing I can do on our side (other than suggest --experimental_reuse_sandbox_directories) is to prune the sysroot more but not enough that the times would be comparable to disabling sandbox or setting --experimental_reuse_sandbox_directories.

@alexeagle
Copy link
Contributor

From a bazel aquery on a single-file cc_library I came up with this list of action inputs:
https://gist.github.com/alexeagle/4158dc78ac0071fb140683b50ab034aa

@f0rmiga
Copy link
Owner

f0rmiga commented Aug 18, 2022

Another thing we can do is to ditch the bootlin files completely and rely on the binaries we already build to create the sysroots. I have some work in progress for this locally already.

@alexeagle
Copy link
Contributor

Ultimately it's the same Bazel problem as the nodejs ecosystem has reported for a long time, where thousands of inputs are slow to sandbox. bazelbuild/bazel#8230 is one issue I've followed.

It feels to me like the ideal solution would be for tools like gcc and clang to be able to use an archive file as a sysroot, then read inside that archive as a virtual filesystem. That decreases it to a single input from Bazel's perspective.

Another option here is to turn off sandboxing for the typical developer workflow, and do something less frequent to ensure that all srcs/deps are actually declared.

@tetsuok
Copy link
Author

tetsuok commented Aug 19, 2022

Thanks for the detailed comments! I appreciate that.

It feels to me like the ideal solution would be for tools like gcc and clang to be able to use an archive file as a sysroot, then read inside that archive as a virtual filesystem. That decreases it to a single input from Bazel's perspective.

I think this sounds great because projects that build LLVM from source with Bazel can benefit from that. But I'm not sure how much effort is needed to implement the functionality inside Bazel. It seems FUSE is necessary to implement that.
(I had a hope that sandboxfs is a solution to sandbox performance issues, but it seems not actively developed.)

@tetsuok
Copy link
Author

tetsuok commented Aug 19, 2022

By the way, when I built the same target with --sandbox_base=/dev/shm, the runtime was similar to that of --experimental_reuse_sandbox_directories, as shown below.
(I accidentally found that option when I looked at Google's project, https://github.com/google/crubit/blob/main/.bazelrc#L12, which builds LLVM from source with Bazel)

=== PHASE SUMMARY INFORMATION ===

Total launch phase time         1.409 s    0.61%
Total init phase time           0.752 s    0.32%
Total target pattern evaluation phase time    1.743 s    0.75%
Total interleaved loading-and-analysis phase time   74.668 s   32.23%
Total preparation phase time    0.021 s    0.01%
Total execution phase time    153.050 s   66.06%
Total finish phase time         0.040 s    0.02%
------------------------------------------------
Total run time                231.685 s  100.00%

Critical path (22.693 s):
       Time Percentage   Description
     132 ms    0.58%   action 'Executing genrule @zlib// copy_public_headers'
    0.18 ms    0.00%   BazelCppSemantics_build_arch_k8-opt-exec-2B5CBBC6 for @zlib// zlib
    0.17 ms    0.00%   BazelCppSemantics_build_arch_k8-opt-exec-2B5CBBC6 for @com_google_protobuf// protobuf
   21.891 s   96.47%   action 'Compiling src/google/protobuf/descriptor.cc'
    97.1 ms    0.43%   action 'Linking external/com_google_protobuf/libprotobuf.a'
     560 ms    2.47%   action 'Linking external/com_google_protobuf/protoc'
    0.23 ms    0.00%   runfiles for @com_google_protobuf// protoc
    12.4 ms    0.05%   action 'Generating Descriptor Set proto_library //examples/protobuf hello_world_proto'

@f0rmiga
Copy link
Owner

f0rmiga commented Aug 19, 2022

Using tmpfs will always be a good performance improvement but at the cost of your precious machine memory. Depending on how much memory you have to spend, that may be a suitable alternative, but I don't think a broader community should follow this path or that we should advise on it. Especially when calculating cloud costs to run CI agents, more yet on large projects, tmpfs may be totally unfeasible. Having said that, --experimental_reuse_sandbox_directories may be better.

@alexeagle
Copy link
Contributor

Created that linked issue to see if someone from the Bazel team can explain why sandbox reuse is still experimental.

@tetsuok
Copy link
Author

tetsuok commented Aug 20, 2022

@f0rmiga Thanks for the explanation. That really makes sense to me. I agree using '--experimental_reuse_sandbox_directories'.

@tetsuok
Copy link
Author

tetsuok commented Aug 20, 2022

@alexeagle Thanks for creating the issue!

@cgrindel cgrindel added question This issue is a question. Close the loop with documentation? need: discussion labels Sep 28, 2022
@f0rmiga
Copy link
Owner

f0rmiga commented Oct 4, 2022

According to bazelbuild/bazel#16138 (comment), experimental_reuse_sandbox_directories could be promoted to stable, meaning it's probably safe to use it.

@tetsuok please reopen if you want to discuss any further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants