Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-90536: Add support for the BOLT post-link binary optimizer #95908

Merged
merged 11 commits into from
Aug 18, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 20 additions & 1 deletion Doc/using/configure.rst
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,8 @@ Performance options
-------------------

Configuring Python using ``--enable-optimizations --with-lto`` (PGO + LTO) is
recommended for best performance.
recommended for best performance. The experimental ``--enable-bolt`` flag can
also be used to improve performance.

.. cmdoption:: --enable-optimizations

Expand Down Expand Up @@ -231,6 +232,24 @@ recommended for best performance.
.. versionadded:: 3.11
To use ThinLTO feature, use ``--with-lto=thin`` on Clang.

.. cmdoption:: --enable-bolt

Enable usage of the `BOLT post-link binary optimizer
<https://github.com/llvm/llvm-project/tree/main/bolt>` (disabled by
default).

BOLT is part of the LLVM project but is not always included in their binary
distributions. This flag requires that ``llvm-bolt`` and ``merge-fdata``
are available.

BOLT is still a fairly new project so this flag should be considered
experimental for now. Because this tool operates on machine code its success
is dependent on a combination of the build environment + the other
optimization configure args + the CPU architecture, and not all combinations
are supported.

.. versionadded:: 3.12

.. cmdoption:: --with-computed-gotos

Enable computed gotos in evaluation loop (enabled by default on supported
Expand Down
4 changes: 4 additions & 0 deletions Doc/whatsnew/3.12.rst
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,10 @@ Optimizations
It reduces object size by 8 or 16 bytes on 64bit platform. (:pep:`623`)
(Contributed by Inada Naoki in :gh:`92536`.)

* Added experimental support for using the BOLT binary optimizer in the build
process, which improves performance by 1-5%.
(Contributed by Kevin Modzelewski in :gh:`90536`.)


CPython bytecode changes
========================
Expand Down
10 changes: 10 additions & 0 deletions Makefile.pre.in
Original file line number Diff line number Diff line change
Expand Up @@ -640,6 +640,16 @@ profile-opt: profile-run-stamp
-rm -f profile-clean-stamp
$(MAKE) @DEF_MAKE_RULE@ CFLAGS_NODIST="$(CFLAGS_NODIST) $(PGO_PROF_USE_FLAG)" LDFLAGS_NODIST="$(LDFLAGS_NODIST)"

bolt-opt: @PREBOLT_RULE@
rm -f *.fdata
@LLVM_BOLT@ ./$(BUILDPYTHON) -instrument -instrumentation-file-append-pid -instrumentation-file=$(abspath $(BUILDPYTHON).bolt) -o $(BUILDPYTHON).bolt_inst
./$(BUILDPYTHON).bolt_inst $(PROFILE_TASK) || true
@MERGE_FDATA@ $(BUILDPYTHON).*.fdata > $(BUILDPYTHON).fdata
@LLVM_BOLT@ ./$(BUILDPYTHON) -o $(BUILDPYTHON).bolt -data=$(BUILDPYTHON).fdata -update-debug-sections -reorder-blocks=ext-tsp -reorder-functions=hfsort+ -split-functions=3 -icf=1 -inline-all -split-eh -reorder-functions-use-hot-size -peepholes=all -jump-tables=aggressive -inline-ap -indirect-call-promotion=all -dyno-stats -use-gnu-stack -frame-opt=hot
rm -f *.fdata
rm -f $(BUILDPYTHON).bolt_inst
mv $(BUILDPYTHON).bolt $(BUILDPYTHON)
kmod marked this conversation as resolved.
Show resolved Hide resolved

# Compile and run with gcov
.PHONY=coverage coverage-lcov coverage-report
coverage:
Expand Down
1 change: 1 addition & 0 deletions Misc/ACKS
Original file line number Diff line number Diff line change
Expand Up @@ -1212,6 +1212,7 @@ Gideon Mitchell
Tim Mitchell
Zubin Mithra
Florian Mladitsch
Kevin Modzelewski
Doug Moen
Jakub Molinski
Juliette Monsel
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Use the BOLT post-link optimizer to improve performance, particularly on
medium-to-large applications.
261 changes: 261 additions & 0 deletions configure

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading