Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Require Python imports to be qualified with repo name #7067

Closed
brandjon opened this issue Jan 9, 2019 · 5 comments
Closed

Require Python imports to be qualified with repo name #7067

brandjon opened this issue Jan 9, 2019 · 5 comments
Labels
P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) stale Issues or PRs that are stale (no activity for 30 days) team-Rules-Python Native rules for Python type: process

Comments

@brandjon
Copy link
Member

brandjon commented Jan 9, 2019

For context see #6886, #7051, and discussion thread.

For all Python modules that are defined within the build (not standard library modules, not extra libraries installed non-hermetically on the system), we should require that they be imported using their fully qualified name, which includes the repo's canonical name. For instance, we should not allow importing Python modules defined in bazel packages //third_party/... or @some_repo//third_party/... via the import statement import third_party.[...], precisely because that's ambiguous and collides between same-named packages in different repos.

For cases where the package really should be top-level, like a repo that exposes an existing library like numpy to the build, the py_library can still use the imports attribute to put it on PYTHONPATH.

We need a design doc to cover this change and also how to handle importing repos from runfiles in the face of repo renaming.

@brandjon
Copy link
Member Author

Note that a flag for disabling exactly this behavior has existed for a long time, and #2636 tracks flipping it.

The problem is that we don't have a complete story for how repo-qualified names should work in the face of repo renaming. If your main repo is named @foo and you import it into other workspaces using repo rules with name = "foo", that's fine, everyone can just write import foo... in their Python programs. But if someone uses a repo rule with name = "bar", this means the runfiles symlink for the repository will be called "bar", and "foo" will be unknown on the PYTHONPATH.

A similar situation can exist for transitive dependencies due to repository name remapping (@dkelmer), but we don't currently rename symlinks so it isn't (yet) an issue. This'll probably change in the future because I think the symlinks need to be renamed to resolve diamond dependencies.

One solution might be to allow repos to declare their canonical name for the purpose of Python imports, independently of their workspace or repo name, and create some symlink structure in runfiles to make the canonical name available.

@brandjon
Copy link
Member Author

I think it's very possible in practice that people will want the canonical top-level name for their repo's Python libraries to be different from the repo's name in its WORKSPACE file. For instance, rules_python may want to refer to its own Python modules as rules_python.experimental.examples... rather than as io_bazel_rules_python.experimental.examples....

Perhaps a more general solution would be to put neither the runfiles root nor any of the repo roots on PYTHONPATH automatically, but force users to use a mechanism like py_library.imports, possibly extended with the ability to supply arbitrary names for the top-level package.

@c-nuro
Copy link

c-nuro commented Jul 26, 2019

Would it be possible to use import hook, or generating some advanced __init__.py which can be more multi-package aware(e.g. setting __path__) and automatically merge modules?

@brandjon
Copy link
Member Author

Yes, that's my current thinking. Take for instance #7091, where Python will unconditionally prepend the directory containing the main script (after chasing symlinks!), causing unwanted visibility of source files that happen to be in the same dir as the main script. I think the best way to control that would involve injecting __init__.py logic into the build, whereas currently all our manipulation of the package path occurs in the stub script, before the user Python process is launched.

If we're adding __init__.py logic to manipulate the path anyway, we may as well also customize how module loading is done in other ways, to address this kind of issue. Instead of just adding things from the runfiles dir to PYTHONPATH, we can allow py_library and py_binary to have more control over what exact name an import is visible as, whether it overrides a standard library module, etc.

@lberki lberki added P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) and removed P2 We'll consider working on this in future. (Assignee optional) labels Nov 18, 2020
@sgowroji sgowroji added the stale Issues or PRs that are stale (no activity for 30 days) label Feb 16, 2023
@sgowroji
Copy link
Member

Hi there! We're doing a clean up of old issues and will be closing this one. Please reopen if you’d like to discuss anything further. We’ll respond as soon as we have the bandwidth/resources to do so.

@sgowroji sgowroji closed this as not planned Won't fix, can't repro, duplicate, stale Feb 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) stale Issues or PRs that are stale (no activity for 30 days) team-Rules-Python Native rules for Python type: process
Projects
None yet
Development

No branches or pull requests

4 participants