Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give imported packages lower precedence than standard library. #6532

Closed
wants to merge 3 commits into from

Conversation

katzdm
Copy link

@katzdm katzdm commented Oct 26, 2018

The current Bazel Python stub template prepends imported packages onto
PYTHONPATH. This effectively gives them higher import-precedence
than any directories:

  • included in PYTHONPATH by the user
  • included in PYTHONPATH or sys.path by the interpreter
    The later category includes the Python standard library.

This can cause real and nontrivial problems for projects. If a project
uses uses pip_import to depend on flake8, then it will also
transitively import enum34, a library which provides compatibility
with the enum library (introduced in Python 3.4) for pre-3.4 versions
of Python, but expects never to be imported by Python>=3.4. The scheme
currently used by Bazel will find the enum34 module prior to finding
the Standard Library enum module, even when run by Python3.7.

The solution proposed here is to build and load a usercustomize module
at program start, which will be loaded during program startup. This
module appends the directory of each imported module to the sys.path
variable, effectively making the packages available, but giving them
lower precedence than built-in directories.

cc: @brandjon

@katzdm katzdm changed the title Give imported packaged lower precedence than standard library. Give imported packages lower precedence than standard library. Oct 26, 2018
@Globegitter
Copy link

Globegitter commented Oct 28, 2018

This sounds great and seems it could also fix the issue we are running into with google/containerregistry#109 - where https://github.com/agronholm/pythonfutures is used for python 2 futures compatibility but it does not work with python 3, so there it should import the syslib but it is giving preference to the dependency.

@brandjon if possible it would be great to get this in for the 0.20 release.

@aiuto aiuto added untriaged team-Rules-Python Native rules for Python labels Oct 29, 2018
@aiuto aiuto removed their assignment Oct 29, 2018
@katzdm
Copy link
Author

katzdm commented Nov 1, 2018

Friendly ping - Jon, any thoughts as to whether a change like this could be workable?

@tmc
Copy link
Contributor

tmc commented Dec 4, 2018

ping @brandjon - AFAICT this is part of improving python ecosystem support in bazel.

@katzdm
Copy link
Author

katzdm commented Dec 12, 2018

Hi @brandjon and @aiuto - Can I ask for an update here? This PR was opened almost two months ago, and no feedback has been given. I understand if the change is more subtle/complex than it appears for google3-reasons, but I'd appreciate some indication of whether or not this (or something like it) is a viable direction.

Thanks in advance!

@brandjon
Copy link
Member

Hi Daniel, apologies for the delay. I've fallen behind on triaging some Python PRs as a result of some feature work on the 2/3 mode issue. I'm on a release manager rotation this week but aim to work through this backlog in the coming days.

A quick comment about the substance of the change: Is it ever desirable in practice to continue to allow packages to override standard library modules? I.e., is that behavior consistent with any common workflow or default configuration in normal Python programming? (For example, ordinarily when there is a file whose name clashes with a standard library module, other files in the same directory will import it instead of the standard library. Though that particular example shouldn't be a concern for Bazel, since we use module names qualified by the package path.) If the answer is yes then we should provide some way to still get the old behavior, otherwise this is basically a bug fix and doesn't need to preserve compatibility.

@katzdm
Copy link
Author

katzdm commented Dec 12, 2018

Hey Brandon - Thanks for getting back to me!

I'm a relative newcomer to the Python ecosystem myself, so I can't speak to whether there are idiomatic examples of overriding the standard library. I would imagine that this ought to be discouraged.

I believe I saw something about Bazel, in the future, allowing alternate template-stubs for Python "binaries". Perhaps this is the answer to projects that wish to use the legacy import-ordering? As you mentioned, it shouldn't be too much of an issue for Bazel, since it generally requires fully qualified package paths.

@brandjon
Copy link
Member

True that having alternate stubs would help. I mainly ask because that feature's not available yet, and we're trying to be better about avoiding breaking changes unless they're gated behind an --incompatible flag that people can opt out of while they migrate. But I'm not sure this change requires that.

@katzdm
Copy link
Author

katzdm commented Dec 12, 2018

What do you think of the following:

  • Announce the intended change in #general and #python of BazelBuild Slack
  • Send an email to the "Bazel/Python Special Interest Group" mailing list
  • Reach out to some internal Google teams that might have more knowledge of whether this would break common idioms (Abseil Python might have some good perspective?)
  • If we don't hear any concerns by end-of-week, then we move forward on Monday

Thoughts?

@katzdm
Copy link
Author

katzdm commented Jan 3, 2019

Hey Jon - Happy new year! 🎉 With the holidays behind us, I was hoping we could revisit and take another look at this change - It's been open for a few months now, with no concerns heard surrounding the "breaking"ness of the change. Is there anything I can do on my end to help move this forward, or help inform whether the change can be merged safely?

Thanks in advance!

bazel-io pushed a commit that referenced this pull request Jan 4, 2019
PR #6532 proposes changing the semantics, but that'll require a test of what we're changing. If we decide to go ahead with that, the assertion in this new test case just needs to be flipped.

RELNOTES: None
PiperOrigin-RevId: 227876534
Copy link
Member

@brandjon brandjon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "should we do this" question aside, we also should have a way to test this change. I wrote an integration test (034334d) that confirms the current semantics. If we make this change, the test assertion just needs to be flipped in this PR.

@brandjon
Copy link
Member

brandjon commented Jan 4, 2019

Reviewed (see above) -- the PR sounds reasonable and it's just a matter of confirming with the community. I'm hoping to avoid the overhead of an incompatible change flag for this.

@brandjon
Copy link
Member

brandjon commented Jan 4, 2019

Also -- what's the rationale for emitting a usercustomize file as opposed to just sticking them directly on the python_path in the stub script? I'm not opposed, just curious.

@katzdm
Copy link
Author

katzdm commented Jan 7, 2019

Also -- what's the rationale for emitting a usercustomize file as opposed to just sticking them directly on the python_path in the stub script? I'm not opposed, just curious.

I think my rationale was that since Python provides a well-defined mechanism for configuring the import search path, it would be better to use the provided abstraction than to modify the path directly. The runtime logic for building the path is somewhat nontrivial. Sticking to provided abstractions also (hopefully) reduces likelihood that we'll come into conflict with any behavioral changes that Python might introduce to import-path-building in the future (though it's worth mentioning that I don't know of any such changes coming down the pipeline).

@brandjon
Copy link
Member

brandjon commented Jan 7, 2019

While there are indeed many moving parts for determining how an import is resolved, it's not clear to me why usercustomize is a better hook than PYTHONPATH itself. I think of them as just two different alternative mechanisms without one necessarily being more preferred than the other. Actually, I would think of usercustomize as more appropriate for a persistent configuration change that applies to all builds done by a given user, a la bashrc.

But I'm fine leaving this as-is. It's not hard to change it later if needed.

(I imagine it might conflict with real usercustomize modules installed on the system, but consulting those would be non-hermetic anyway.)

Mailing list results look pretty positive on just making this change. Will give it at least to later today before merging.

@katzdm
Copy link
Author

katzdm commented Jan 7, 2019

While there are indeed many moving parts for determining how an import is resolved, it's not clear to me why usercustomize is a better hook than PYTHONPATH itself. I think of them as just two different alternative mechanisms without one necessarily being more preferred than the other. Actually, I would think of usercustomize as more appropriate for a persistent configuration change that applies to all builds done by a given user, a la bashrc.

But I'm fine leaving this as-is. It's not hard to change it later if needed.

(I imagine it might conflict with real usercustomize modules installed on the system, but consulting those would be non-hermetic anyway.)

Mailing list results look pretty positive on just making this change. Will give it at least to later today before merging.

I believe it may also have had to do with import-precedence; IIRC, all of PYTHONPATH is prepended to the import-path with higher precedence than standard libraries, whereas usercustomize is appended with lower precedence.

The current Bazel Python stub template prepends imported packages onto
`PYTHONPATH`. This effectively gives them higher `import`-precedence
than any directories:
- included in `PYTHONPATH` by the user
- included in `PYTHONPATH` or `sys.path` by the interpreter
The later category includes the Python standard library.

This can cause real and nontrivial problems for projects. If a project
uses uses `pip_import` to depend on `flake8`, then it will also
transitively import `enum34`, a library which provides compatibility
with the `enum` library (introduced in Python 3.4) for pre-3.4 versions
of Python, but expects never to be imported by Python>=3.4. The scheme
currently used by Bazel will find the `enum34` module prior to finding
the Standard Library `enum` module, even when run by Python3.7.

The solution proposed here is to build and load a `usercustomize` module
at program start, which will be loaded during program startup. This
module appends the directory of each imported module to the `sys.path`
variable, effectively making the packages available, but giving them
lower precedence than built-in directories.
@brandjon
Copy link
Member

brandjon commented Jan 7, 2019

Ah, I assumed the standard libraries were on the PYTHONPATH, but that makes sense.

I have a commit to update the failing test in accordance with this behavior change -- can you give access to the PR branch?

@katzdm
Copy link
Author

katzdm commented Jan 7, 2019

Ah, I assumed the standard libraries were on the PYTHONPATH, but that makes sense.

I have a commit to update the failing test in accordance with this behavior change -- can you give access to the PR branch?

I believe the "Allow edits from maintainers" option should already be enabled.

@brandjon
Copy link
Member

brandjon commented Jan 7, 2019

My mistake, pushed.

@brandjon
Copy link
Member

brandjon commented Jan 7, 2019

The failures look legit. Example failure for //src/test/shell/bazel:bazel_windows_example_test, function test_native_python:

Traceback (most recent call last):
  File "\\?\D:\temp\Bazel.runfiles_zl35ntyr\runfiles\main\examples\py_native\bin.py", line 3, in <module>
    from examples.py_native.lib import GetNumber
ModuleNotFoundError: No module named 'examples'

Test source code

It's failing to find the module entirely, not shadowing the module with something else of the same name, so it can't just be the order of PYTHONPATH. Maybe usercustomize works differently under windows? Maybe there's already a usercustomize.py available in the windows environment, shadowing the one you're generating here? I wonder if site.addsitedir() would work instead?

@Profpatsch
Copy link
Contributor

Profpatsch commented Jan 8, 2019

I think the issue & reproduction I opened yesterday is relevant to this pull request: #7051

Do you think merging this PR fixes the problem?

@katzdm
Copy link
Author

katzdm commented Jan 8, 2019

Yep; failure looks legit to me as well - I don't have access to a Windows box, so I'm going to need to iterate a bit on this, using your CI to help debug.

@brandjon
Copy link
Member

brandjon commented Jan 8, 2019

@Profpatsch: No, this PR is about the case where a library declared within the Bazel build collides with a library setup by the system/environment, and resolves that case in favor of the system's library. The issue you filed is about two libraries clashing that are both declared within the Bazel build. I was actually just preparing a discussion about that problem, which is also reported in #6886 -- I'll post it on the bazel-sig-python list.

@rickeylev
Copy link
Contributor

Just to confirm: It sounds like this fixes issue #5899?

@brandjon
Copy link
Member

brandjon commented Jan 8, 2019

Yes, thanks for the crosslink.

@katzdm katzdm force-pushed the python-path branch 2 times, most recently from 780365b to 9ecec27 Compare January 8, 2019 18:34
@ali5h
Copy link

ali5h commented Feb 27, 2019

usercustomise works only when ENABLE_USER_SITE=True. That maybe the reason it fails on windows. I think appending the paths to sys.path might more resilient to different compilations of python

@brandjon
Copy link
Member

brandjon commented Apr 5, 2019

See #7959 for other sys.path related problems. For this particular issue I think sitecustomize.py might be the way to go, but that may be redundant if we end up replacing the stub script with a wrapper module that runs in the same interpreter process as user code.

@elistevens
Copy link

I realize that this is a very old branch, but is my reading correct that this change will make it harder for builds to isolate themselves from whatever random packages might be installed on the system?

If so, this seems like a step backwards to me. I'd much rather bazel python targets declare all their dependencies and operate with the equivalent of python -S.

Imagine the situation where the user is on a system that has package foo==1.0 installed on the system, but they need foo==2.5.1 for their build to actually work. If the system foo comes first, the user is left without a way to get a working build short of modifying the system install (which could very well break something else that needed the 1.0 API).

@aiuto
Copy link
Contributor

aiuto commented May 11, 2020

Ping to everyone.
@brandjon - is this still a good idea?
@katzdm - are you going to fix the tests?

@katzdm
Copy link
Author

katzdm commented May 11, 2020

I'm not likely to have time or energy to revisit this PR in the near future - Would recommend closing for the time being, and restarting discussion of whether there's a better solution if it's still an issue for other users.

@aiuto
Copy link
Contributor

aiuto commented May 11, 2020

Thanks for that update.

@aiuto aiuto closed this May 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes team-Rules-Python Native rules for Python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants