Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: don't require system Python to perform bootstrapping #1929

Merged
merged 2 commits into from
Jun 2, 2024

Conversation

rickeylev
Copy link
Contributor

@rickeylev rickeylev commented May 29, 2024

This is a pretty major, but surprisingly not that invasive, overhaul of how binaries
are started. It fixes several issues and lays ground work for future improvements.

In brief:

  • A system Python is no longer needed to perform bootstrapping.
  • Errors due to PYTHONPATH exceeding environment variable size limits is no
    longer an issue.
  • Coverage integration is now cleaner and more direct.
  • The zipapp __main__.py entry point generation is separate from the Bazel
    binary bootstrap generation.
  • Self-executable zips now have actual bootstrap logic.

The way all of this is accomplished is using a two stage bootstrap process. The first
stage is responsible for locating the interpreter, and the second stage is responsible
for configuring the runtime environment (e.g. import paths). This allows the first
stage to be relatively simple (basically find a file in runfiles), so implementing it
in cross-platform shell is feasible. The second stage, because it's running under the
desired interpreter, can then do things like setting up import paths, and use the
runpy module to call the program's real main.

This also fixes the issue of long PYTHONPATH environment variables causing an error.
Instead of passing the import paths using an environment variable, they are embedded
into the second stage bootstrap, which can then add them to sys.path.

This also switches from running coverage as a subprocess to using its APIs directly.
This is possible because of the second stage bootstrap, which can rely on
import coverage occurring in the correct environment.

This new bootstrap method is disabled by default. It can be enabled by setting
--@rules_python//python/config_settings:bootstrap_impl=two_stage. Once the new APIs
are released, a subsequent release will make it the default. This is to allow easier
upgrades for people defining their own toolchains.

The two-stage bootstrap ignores errors during lcov report generation, which
partially addresses #1434

Fixes #691

  • Also fixes some doc cross references.
  • Also fixes the autodetecting toolchain and directs our alias to it

Copy link
Collaborator

@aignas aignas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

python/config_settings/BUILD.bazel Show resolved Hide resolved
python/private/BUILD.bazel Show resolved Hide resolved
python/private/common/providers.bzl Outdated Show resolved Hide resolved
python/private/stage1_bootstrap_template.sh Show resolved Hide resolved
python/private/stage2_bootstrap_template.py Show resolved Hide resolved
@rickeylev
Copy link
Contributor Author

Well, it's about my EOD, so I thought I'd post the latest laugh/cry/wtf moment from fixing CI failures. It's kind of an amusing regression that actually looks like a promising fix of another undesirable behavior that has long been a plague.

The new bootstrap doesn't suffer from the sys.path[0] problem where the directory of main is added to sys.path. This is even if safepath is disabled or running in a pre 3.11 version. Yay!

Why? Because the e.g. python foo/bar.py path passed resolves to e.g. bazel-out/bin/.../foo, which only contains other generated files, thus preventing most imports for that sys.path entry.

But, ironically, fixing that causes another problem. Boo! Without that entry, the runfiles root is then the first sys.path entry. One of our tests performs import __init__ (odd, but allowed), because it assumes that will come from the repo root. What else has __init__ ? The runfiles root has an empty one from being auto-generated. So it was actually the undesirable non-safepath behavior making this work.

But, good news: because we can perform more arbitrary sys.path manipulations, we can manipulate sys.path as we need. Stated another way: we can implement SAFEPATH=True|False behavior in our bootstrap regardless of Python version. Yay!

@rickeylev rickeylev requested a review from f0rmiga as a code owner May 31, 2024 07:23
@rickeylev rickeylev force-pushed the free.bootstrap branch 2 times, most recently from 6f58637 to 0251e5b Compare May 31, 2024 23:00
@rickeylev rickeylev enabled auto-merge May 31, 2024 23:03
@rickeylev rickeylev added this pull request to the merge queue Jun 1, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 1, 2024
@rickeylev rickeylev added this pull request to the merge queue Jun 2, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 2, 2024
@aignas aignas added this pull request to the merge queue Jun 2, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 2, 2024
This is a pretty major, but surprisingly non-invasive, overhaul of how
binaries are started. It fixes several issues and lays ground work for
future improvements.

In brief:

* A system Python is no longer needed to perform bootstrapping.
* Errors due to `PYTHONPATH` exceeding environment variable size limits
  is no longer an issue.
* Coverage integration is now cleaner and more direct.
* The zipapp `__main__.py` entry point generation is separate from the
  Bazel binary bootstrap generation.
* Self-executable zips now have actual bootstrap logic.

The way all of this is accomplished is using a two stage bootstrap
process. The first stage is responsible for locating the interpreter,
and the second stage is responsible for configuring the runtime
environment (e.g. import paths). This allows the first stage to be
relatively simple (basically find a file in runfiles), so implementing
it in cross-platform shell is feasible. The second stage, because it's
running under the desired interpreter, can then do things like setting
up import paths, and use the `runpy` module to call the program's real
main.

This also fixes the issue of long `PYTHONPATH` environment variables
causing an error. Instead of passing the import paths using an
environment variable, they are embedded into the second stage bootstrap,
which can then add them to sys.path.

This also switches from running coverage as a subprocess to using its
APIs directly. This is possible because of the second stage bootstrap,
which can rely on `import coverage` occurring in the correct
environment.

This new bootstrap method is disabled by default. It can be enabled
by setting
`--@rules_python//python/config_settings:bootstrap_impl=two_stage`.
Once the new APIs are released, a subsequent release will make it the
default. This is to allow easier upgrades for people defining their
own toolchains.
@rickeylev rickeylev added this pull request to the merge queue Jun 2, 2024
Merged via the queue into bazelbuild:main with commit f5b19dc Jun 2, 2024
4 checks passed
@rickeylev rickeylev deleted the free.bootstrap branch June 2, 2024 15:21
CHANGELOG.md Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

py_binary with hermetic toolchain requires a system interpreter
3 participants