Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'Design for a Python Toolchain' proposal #161

Merged
merged 3 commits into from
Feb 12, 2019
Merged

Add 'Design for a Python Toolchain' proposal #161

merged 3 commits into from
Feb 12, 2019

Conversation

brandjon
Copy link
Member

This PR adds a design doc on making the native Python rules use the toolchain framework to find their runtimes. Follow bazelbuild/bazel#7375 for discussion and design review.

@katre Please do a quick validation-review that the doc is publishable for review.

Reviewer roles:

  • @katre for Bazel team and platform/toolchain domain expertise
  • @mrovner for Python rules expertise
  • @nlopezgi for remote execution rule logic expertise

@brandjon brandjon requested a review from katre February 12, 2019 16:39
@brandjon brandjon merged commit 7e167d8 into master Feb 12, 2019
@brandjon brandjon deleted the pytoolchain branch February 12, 2019 17:42
Copy link

@nlopezgi nlopezgi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this work Jon, from the perspective of remote execution this all looks good to me. One comment though from the perspective of rule owners (and might be something to do in a later PR): For rules_docker we have a section related to how to use the toolchain rules properly from rules that extend rules_docker (https://github.com/bazelbuild/rules_docker/tree/master/toolchains/docker#overview).
In this case I think you need something similar as there are rules that execute the native py_binary (e.g., rules_docker does here: https://github.com/bazelbuild/rules_docker/blob/master/python/image.bzl#L84) and there might be something special that these type of users need to do to work with the new py toolchains properly. I'd be happy to work with you to try out the changes (whenever they are ready to try out) to see if we can use them effectively in rules_docker.


`PyRuntimeInfo` is a newly-exposed provider returned by the native [`py_runtime`](https://docs.bazel.build/versions/master/be/python.html#py_runtime) rule. It can be loaded from `@bazel_tools//tools/python:toolchain.bzl`. It has the following fields, each of which corresponds to an attribute on `py_runtime`. (The last one, `python_version`, is newly added in this doc.)

* `interpreter_path`: An absolute filesystem path to a Python interpreter (or a script that launches a Python interpreter, forwarding any command-line arguments) available on the target platform. Must be `None` if `interpreter` is non-`None`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ target platform / execution platform

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Python it's actually the target platform, because it's an interpreter executed at runtime as opposed to a compiler executed at build time. Bazel's "building" of a py_binary is really just laying out the runfiles and generating the stub script that'll be the entry point to its execution later.

(In the case where you're running a tool written in Python on the execution platform of another target, that second target's execution platform is the Python binary's "target" platform, so the description is still accurate.)


and thanks to toolchain resolution, the resulting executable will automatically know to use the interpreter located at `/usr/weirdpath/python3`.

If we had not defined a custom toolchain, then we'd be stuck with `autodetecting_python_toolchain`, which would fail at execution time if `/usr/weirdpath` were not on `PATH`. (It would also be slightly slower since it requires an extra invocation of the interpreter at execution time to confirm its version.)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"slightly slower"
is this a per-target penalty or a once when configuring the toolchain penalty?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a per-target penalty, and it's paid at execution time, not build time. So each time you run a py_binary artifact (the stub script) built using this autodetecting toolchain, it will pay the cost of exec-ing to the extra wrapper script (as opposed to directly to the interpreter), searching directories on PATH for the system interpreter, and verifying its version by invoking it with the -V flag.

Fortunately Python does in fact have a -V flag, so we don't need to actually start up the true interpreter just to print sys.version. I didn't realize that when I wrote this parenthetical.

Originally I also had a host_python_toolchain in this proposal, that would autodetect the Python binary just once at WORKSPACE evaluation time, but only for the host platform. I removed it at katre@'s suggestion since it didn't seem to add much value.

In any case, users should be defining explicit toolchains for their platforms, which will avoid any inefficiency implied by this default fallback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants