Add 'Design for a Python Toolchain' proposal #161

brandjon · 2019-02-12T16:39:05Z

This PR adds a design doc on making the native Python rules use the toolchain framework to find their runtimes. Follow bazelbuild/bazel#7375 for discussion and design review.

@katre Please do a quick validation-review that the doc is publishable for review.

Reviewer roles:

@katre for Bazel team and platform/toolchain domain expertise
@mrovner for Python rules expertise
@nlopezgi for remote execution rule logic expertise

proposals/2019-02-12-design-for-a-python-toolchain.md

nlopezgi

Thanks a lot for this work Jon, from the perspective of remote execution this all looks good to me. One comment though from the perspective of rule owners (and might be something to do in a later PR): For rules_docker we have a section related to how to use the toolchain rules properly from rules that extend rules_docker (https://github.com/bazelbuild/rules_docker/tree/master/toolchains/docker#overview).
In this case I think you need something similar as there are rules that execute the native py_binary (e.g., rules_docker does here: https://github.com/bazelbuild/rules_docker/blob/master/python/image.bzl#L84) and there might be something special that these type of users need to do to work with the new py toolchains properly. I'd be happy to work with you to try out the changes (whenever they are ready to try out) to see if we can use them effectively in rules_docker.

nlopezgi · 2019-02-12T17:48:50Z

proposals/2019-02-12-design-for-a-python-toolchain.md

+
+`PyRuntimeInfo` is a newly-exposed provider returned by the native [`py_runtime`](https://docs.bazel.build/versions/master/be/python.html#py_runtime) rule. It can be loaded from `@bazel_tools//tools/python:toolchain.bzl`. It has the following fields, each of which corresponds to an attribute on `py_runtime`. (The last one, `python_version`, is newly added in this doc.)
+
+* `interpreter_path`: An absolute filesystem path to a Python interpreter (or a script that launches a Python interpreter, forwarding any command-line arguments) available on the target platform. Must be `None` if `interpreter` is non-`None`.


s/ target platform / execution platform

For Python it's actually the target platform, because it's an interpreter executed at runtime as opposed to a compiler executed at build time. Bazel's "building" of a py_binary is really just laying out the runfiles and generating the stub script that'll be the entry point to its execution later.

(In the case where you're running a tool written in Python on the execution platform of another target, that second target's execution platform is the Python binary's "target" platform, so the description is still accurate.)

nlopezgi · 2019-02-12T17:56:53Z

proposals/2019-02-12-design-for-a-python-toolchain.md

+
+and thanks to toolchain resolution, the resulting executable will automatically know to use the interpreter located at `/usr/weirdpath/python3`.
+
+If we had not defined a custom toolchain, then we'd be stuck with `autodetecting_python_toolchain`, which would fail at execution time if `/usr/weirdpath` were not on `PATH`. (It would also be slightly slower since it requires an extra invocation of the interpreter at execution time to confirm its version.)


"slightly slower"
is this a per-target penalty or a once when configuring the toolchain penalty?

It's a per-target penalty, and it's paid at execution time, not build time. So each time you run a py_binary artifact (the stub script) built using this autodetecting toolchain, it will pay the cost of exec-ing to the extra wrapper script (as opposed to directly to the interpreter), searching directories on PATH for the system interpreter, and verifying its version by invoking it with the -V flag.

Fortunately Python does in fact have a -V flag, so we don't need to actually start up the true interpreter just to print sys.version. I didn't realize that when I wrote this parenthetical.

Originally I also had a host_python_toolchain in this proposal, that would autodetect the Python binary just once at WORKSPACE evaluation time, but only for the host platform. I removed it at katre@'s suggestion since it didn't seem to add much value.

In any case, users should be defining explicit toolchains for their platforms, which will avoid any inefficiency implied by this default fallback.

Add 'Design for a Python Toolchain' proposal

3954d6d

brandjon requested a review from katre February 12, 2019 16:39

googlebot added the cla: yes label Feb 12, 2019

Fix link

da24d0c

brandjon mentioned this pull request Feb 12, 2019

Add a Python doc, update last-updated columns bazelbuild/proposals#92

Merged

katre suggested changes Feb 12, 2019

View reviewed changes

proposals/2019-02-12-design-for-a-python-toolchain.md Outdated Show resolved Hide resolved

proposals/2019-02-12-design-for-a-python-toolchain.md Show resolved Hide resolved

Add links to documentation

33e2fb8

katre approved these changes Feb 12, 2019

View reviewed changes

brandjon merged commit 7e167d8 into master Feb 12, 2019

brandjon deleted the pytoolchain branch February 12, 2019 17:42

nlopezgi reviewed Feb 12, 2019

View reviewed changes

brandjon mentioned this pull request Feb 12, 2019

Switch Python rules to use toolchains bazelbuild/bazel#7375

Closed

sjincho pushed a commit to sjincho/rules_python that referenced this pull request Mar 13, 2019

Add 'Design for a Python Toolchain' proposal (bazelbuild#161)

02354d0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 'Design for a Python Toolchain' proposal #161

Add 'Design for a Python Toolchain' proposal #161

brandjon commented Feb 12, 2019

nlopezgi left a comment

nlopezgi Feb 12, 2019

brandjon Feb 12, 2019

nlopezgi Feb 12, 2019

brandjon Feb 12, 2019


		`PyRuntimeInfo` is a newly-exposed provider returned by the native [`py_runtime`](https://docs.bazel.build/versions/master/be/python.html#py_runtime) rule. It can be loaded from `@bazel_tools//tools/python:toolchain.bzl`. It has the following fields, each of which corresponds to an attribute on `py_runtime`. (The last one, `python_version`, is newly added in this doc.)

		* `interpreter_path`: An absolute filesystem path to a Python interpreter (or a script that launches a Python interpreter, forwarding any command-line arguments) available on the target platform. Must be `None` if `interpreter` is non-`None`.


		and thanks to toolchain resolution, the resulting executable will automatically know to use the interpreter located at `/usr/weirdpath/python3`.

		If we had not defined a custom toolchain, then we'd be stuck with `autodetecting_python_toolchain`, which would fail at execution time if `/usr/weirdpath` were not on `PATH`. (It would also be slightly slower since it requires an extra invocation of the interpreter at execution time to confirm its version.)

Add 'Design for a Python Toolchain' proposal #161

Add 'Design for a Python Toolchain' proposal #161

Conversation

brandjon commented Feb 12, 2019

nlopezgi left a comment

Choose a reason for hiding this comment

nlopezgi Feb 12, 2019

Choose a reason for hiding this comment

brandjon Feb 12, 2019

Choose a reason for hiding this comment

nlopezgi Feb 12, 2019

Choose a reason for hiding this comment

brandjon Feb 12, 2019

Choose a reason for hiding this comment