Skip to content
This repository has been archived by the owner on Oct 2, 2023. It is now read-only.

Fix builds with python3 as default python #490

Merged
merged 3 commits into from
Aug 16, 2018

Conversation

nlopezgi
Copy link
Contributor

@nlopezgi nlopezgi commented Aug 16, 2018

Create py_runtime rule to use with envs that have python pointing to python2 version

To run a build in an environment that has which python point to a version python3 you need to add the following flag:
--python_top=//python:py_runtime_2

You must also make sure to have a symlink to a valid python 2 binary in "/usr/bin/python2"

Solves:
#454
Replaces:
#481

Note. PYTHON2 IS A DEPENDENCY OF THESE RULES (due to deps of container regsitry). While theoretically it should be possible to refactor container registry to remove this dependency, that is a big refactoring we do not need atm. This PR solves the issue that Bazel by default picks the python binary found with "which python". If which python points to a python3 version, then you could not build with these rules. With this PR, you need to have a py2 version installed, but can have "which python" point to whatever you want.
Note, you can add the flag --python_top=//python:py_runtime_2 to a bazelrc file in your project to avoid having to write it every time (if the expectation in your project is that most host environments will have which python point to python3)

@xingao267
Copy link
Member

Can you add some comment above the py_runtime rule about how to use it instead of just in the PR comment.

@xingao267 xingao267 changed the title Fix builds with pytho3 as default python Fix builds with python3 as default python Aug 16, 2018
@nlopezgi nlopezgi merged commit c6113a5 into bazelbuild:master Aug 16, 2018
Copy link
Contributor

@siddharthab siddharthab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried solutions like this before in my monorepo to support both python2 and python3 code to co-exist, but nothing quite worked out. The best solution I could come up with was to make this proposed change in bazel.

@@ -14,3 +14,16 @@
package(default_visibility = ["//visibility:public"])

licenses(["notice"]) # Apache 2.0

# Python2 is a dependency of rules_docker due to its dependency on
# containerregistry. To run builds on environments in which 'which python'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or if the user has set python_path to be something else in their bazelrc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how this would work if the user has set python_path to something else as i think this flag would override it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant just to change the documentation. This runtime will override the python_path flag, which by default is set to "python", but if users have set it to something else, then that will also be overridden.

# containerregistry. To run builds on environments in which 'which python'
# points to python3, you can use the py_runtime target below.
# To use this target, simply add the following flag to your build commands:
# --python_top=//python:py_runtime_2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you are building a python3 binary and packaging in a docker image in the same bazel command?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, Bazel does not quite support running builds with python2 and python3 at the same time (as far as I know). This is one of the outstanding problems for python support with Bazel that has been around for quite some time. I recommend you open an issue to ask about support for this use case (these are some of the old issues I saw related to this topic: https://github.com/bazelbuild/bazel/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aopen+python2+python3)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which actually is the main problem. For repos with python3 code, they have already set their python_path flag to point to python3 which makes building python2 code impossible. If it's just a question of "python" on people's machine pointing to python3, but all their monorepo code is python2, then maybe they are already setting the python_path or python_top.


# Python2 is a dependency of rules_docker due to its dependency on
# containerregistry. To run builds on environments in which 'which python'
# points to python3, you can use the py_runtime target below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also verify that this works with the build_python_zip bazel flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too familiar with the build_python_zip flag. I ran the tests in the repo using these flags:
--python_top=//python:py_runtime_2 --build_python_zip
and the tests worked as expected (some of our tests seem to not be compatible with build_python_zip flag, but for unrelated reasons). Please let me know if you think we need to perform other checks to verify this works as you expect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for testing with the flag. I think this is sufficient.

@nlopezgi
Copy link
Contributor Author

Thanks for your comments. Unfortunately, as I think you already know (from the pointed thread), there are deep issues in Bazel when it comes to python support, and coexisting py2 / py3 deps in builds are a big problem. However, I think this should be a good time to restart the conversation with the Bazel team about python. Particularly, iiuc @brandjon is interested in hearing issues that Bazel customers are having wrt python. Could you open up an issue in bazel github, and lets take the discussion there.

@siddharthab
Copy link
Contributor

So I think the correct solution to this problem will be to have a wrapper script that searches for python2 on the user's machine and then invokes the actual program using that interpreter. It will have corner cases but should work in a majority of the cases.

@siddharthab
Copy link
Contributor

A more proper intermediate solution is my proposed change to bazel as in https://groups.google.com/d/msg/bazel-sig-python/zc3JaVrjgus/NXn-h8ixCAAJ.

@nlopezgi
Copy link
Contributor Author

nlopezgi commented Aug 19, 2018

I agree that the correct solution involves functionality to detect python (both 2 and 3) in a users host env and to enable python targets to properly use them. However, using a wrapper script breaks hermeticity and is not the recommended Bazel approach for this type of problem. Bazel provides a native solution for this kind of problems: skylark repository rules (https://docs.bazel.build/versions/master/skylark/repository_rules.html).
This is the solution that currently exists for the c compiler: https://github.com/bazelbuild/bazel/blob/master/tools/cpp/unix_cc_configure.bzl.

The problem, as I stated above, is that python rules in Bazel just don't work; the solution is to make them use toolchain rules properly (https://docs.bazel.build/versions/master/toolchains.html). The auto configuration (i.e., detection of host python(s)) would have to be done in a repository rule, which would work similarly to the C version. The toolchain rules would need to replace completely the py_runtime rules, and offer attributes that actually make sense to python users; all of the python rules would need to be significantly refactored to use the new toolchain rules; every single user of the current python rules would have to change their code. As you can see, this is not a small change (but its one I think we need to happen).

As I mentioned above, this would be a good time to start a new thread with Bazel folks to try to have the conversation with them about the right solution, as I do think this is on someone's radar (I'll be happy to comment what our issues with rules_docker wrt to python are there). I don't think having the conversation here in this repo will get you much attention, and I don't think we can do much in this repo at the moment other than use the features available (i.e., the current broken python rules), and keep our eyes out for upcoming design documents about the future of python to make sure our use cases will be properly covered.

@siddharthab
Copy link
Contributor

I don't think runtime wrapper scripts break hermeticity, but anyway, let's carry the discussion on a bazel forum.

For this change, I prefer using the python_path flag more than the python_top flag because python_top requires absolute paths which will not be the same on Mac and Linux. So in this change, we should just document in the top-level README that users should use python_path to specify a python2 interpreter if their default is something else.

@nlopezgi
Copy link
Contributor Author

Thanks for the advice about python_path, I'll add it to documentation and consider removing the python_top and just advice folk to use python_path.

Using a wrapper script (iiuc what you mean) that checks where python is installed as part of a build action, is the textbook definition of what you should not do with Bazel as it breaks hermeticity assumptions the tool makes. For example, imagine this scenario:

  • I build target A and the wrapper script detects py2.x in location y, and uses that to build output artifact o.A
  • then I decide to change the version of python I have (to update to py3.x)
  • I then build target B, which depends on target A, and uses as input o.A. Bazel thinks it does not need to rebuild o.A, as, as far as it knows, none of its inputs have changed.
  • Target B fails to build, as o.A is not compatible, user gets a build error that does not make sense.
  • To fix, user is forced to do bazel clean, which kills productivity

Any scenario that ends with 'I get a build error, I need to do bazel clean and rebuild from scratch' is showing an underlying issue with your build (but maybe I'm misunderstanding where/when you expect this wrapper script to run).

In any case, as you say, this is not the place to have this conversation. Happy to talk more about how Bazel expects build actions to be hermetic, and how checking the state of environment during a build action breaks this assumption.

@siddharthab
Copy link
Contributor

I was suggesting using a wrapper script as the entrypoint to your main program, your build action stays completely hermetic.

Basically, a script like the below is what should be the entry point for the py_binary target, and the actual .py script should just be a runtime dependency. I have not tested this, so I do not know if this will work with subpar or not.

#!/usr/bin/env python

import os
import sys

# Find a file in a given search path.
def SearchPath():
  search_path = os.getenv('PATH', os.defpath).split(os.pathsep)
  for directory in search_path:
    if directory == '': continue
    path = os.path.join(directory, "python2")
    if os.path.isfile(path) and os.access(path, os.X_OK):
      return path
  return None

def Main():
  if sys.version_info[0] < 3:
    python_binary = sys.executable
  else:
    python_binary = SearchPath()

  runfiles_dir = os.getenv("RUNFILES_DIR")
  args = sys.argv
  args[0] = os.path.join(runfiles_dir, "containerregistry/puller.py", bin_name)

  os.unsetenv("PYTHONPATH")
  os.execv(python_binary, [python_binary] + args)

if __name__ == '__main__':
  Main()

@nlopezgi
Copy link
Contributor Author

iiuc, with your wrapper, any targets that depend on a py_binary (e.g., a genrule that executes the py_binary executable output) would become non-hermetic. To specify more my scenario above:

  • Imagine a genrule (would be target A) that executes the output of py_binary and produces as output some other file, o.A
  • Another target (target B) depends on the output of the genrule o.A, and the output of some other genrule o.C.
  • User builds target A with python env vars set to python 2, producing o.A(py2)
  • User then changes python env vars to python 3
  • User builds target C (say its similar to target A, a genrule that executes the py_binary executable). Target C produces o.C(py3)
  • User then tries to build Target B which depends both on o.A and o.C
  • Target B cannot be built successfully because o.A(py2) and o.C(py3) are not compatible
  • User runs bazel clean and then builds Target B again, this time it succeeds as o.A(py3) and o.C(py3) are now produced.

As I said, any attempt to access environment variables / host environment info from build actions or from the outputs of build actions results in non-hermetic behavior and is not recommended in Bazel, that is what repository rules were designed and built for.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants