Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Use .pth file to import distutils from setuptools #2260

Closed
wants to merge 9 commits into from

Conversation

pganssle
Copy link
Member

Summary of changes

This is a WIP proof of concept for #2259.

It looks to be a bit tricky to pull it off — at the moment, this breaks everything because the circular import can't resolve distutils.core. The two broad-strokes options I see for resolving this:

  1. Move all references to distutils within setuptools to point to ._distutils.
  2. Move setuptools._distutils into _distutils_importer/distutils-shim-package.

There may be other options, though.

Pull Request Checklist

  • Changes have tests
  • News fragment added in changelog.d. See documentation for details

@pganssle
Copy link
Member Author

I've updated this to use the module import-time hook described in this comment.

Theoretically we could replace the setuptools.distutils_patch import entirely and do all the work in the loader, but I think it's best to make it so that import setuptools is guaranteed to do the distutils patch, even in the absence of the .pth file.

I could easily imagine a situation where a redistributor decides to forego the .pth file in favor of another solution like a sitecustomize.py or something.

@pganssle
Copy link
Member Author

I've restored the warnings now, since in the new sys.meta_path approach we aren't unconditionally hitting those code paths, but I think they still need to be adjusted — the only way an end user would see those warnings is if our hack has failed in some way.

Also I'm not sure how best to test this stuff. Any ideas @jaraco? Given that you brought up the zipimport case, do you have any examples using zipimport with setuptools? I tried to come up with one and didn't really succeed.

@jaraco
Copy link
Member

jaraco commented Jul 14, 2020

Also I'm not sure how best to test this stuff. Any ideas @jaraco?

I've mainly tested it by ensuring that import distutils imports the local module in various scenarios.

Given that you brought up the zipimport case, do you have any examples using zipimport with setuptools?

I believe you can simply build a wheel and then set PYTHONPATH=setuptools.whl. Unfortunately, that approach doesn't honor the .pth file, even though the .pth file appears at the top of the wheel.

I've also observed that the setup.py develop case doesn't seem to get the .pth file in a site dir, so import distutils gets the stdlib version. This behavior even applies to the setuptools test suite.

I'd forgotten that .pth files only have their effect if they're found in a "site" dir, so that suggestion may have been a bad one.

Hmm. I do notice that easy-install.pth gets installed to site-packages even for an editable install. Perhaps setuptools will need to re-use that technique for installing the distutils_precedence.pth to a site directory.

That still won't address the zipimporter issue, but now that I think about it, it may be users in those environments need to take explicit action to enact startup behavior as future-fstrings advises its users.

distutils_precedence.pth Outdated Show resolved Hide resolved
Copy link
Member

@jaraco jaraco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, I'm inclined to move forward with this approach and just not support the early importer hack with setuptools installed as an editable install.

@jaraco
Copy link
Member

jaraco commented Jul 20, 2020

In e371422, I've pushed my suggestions, and I've confirmed the behavior works as intended:

$ tox --notest
# replace editable install with regular install
$ .tox/python/bin/pip install .
$ env SETUPTOOLS_USE_DISTUTILS=local .tox/python/bin/python -c "import distutils; print(distutils.__file__)
/Users/jaraco/code/public/pypa/setuptools/setuptools/_distutils/__init__.py

Interestingly, when using pip-run to test, I encounter an error:

~ $ env SETUPTOOLS_USE_DISTUTILS=local pip-run -q ~/p/pypa/setuptools -- -c "import distutils; print(distutils.__file__)"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 914, in _find_spec
  File "/var/folders/c6/v7hnmq453xb6p2dbz1gqc6rr0000gn/T/pip-run-8u3pbhxa/_distutils_importer/__init__.py", line 25, in find_spec
    return self.get_distutils_spec()
  File "/var/folders/c6/v7hnmq453xb6p2dbz1gqc6rr0000gn/T/pip-run-8u3pbhxa/_distutils_importer/__init__.py", line 30, in get_distutils_spec
    class DistutilsLoader(importlib.util.abc.Loader):
AttributeError: module 'importlib' has no attribute 'util'

That's fixed in 2986be4.

~ $ env SETUPTOOLS_USE_DISTUTILS=local pip-run -q ~/p/pypa/setuptools -- -c "import distutils; print(distutils.__file__)"
/var/folders/c6/v7hnmq453xb6p2dbz1gqc6rr0000gn/T/pip-run-bqsvvuge/setuptools/_distutils/__init__.py

@pganssle
Copy link
Member Author

Ok, I'll take a look in the morning.

I think it's fine to basically fall back to the "you must import setuptools first" behavior if the end user can remedy this themselves by installing setuptools a different way. The main thing I'm concerned about is if there is a reasonable / supported workflow where end users will start seeing breakages due to import order in packages they don't control and can't easily fix.

I think that with PEP 517/518 and the fact that you can always fix this by installing setuotools in a normal way, we're probably fine.

Not sure how well tested any of this is, though...

@pganssle
Copy link
Member Author

@jaraco Looks like the changes you made make it so that you either get the distutils patch via the .pth file or you don't get it at all.

@pganssle pganssle closed this Jul 20, 2020
@pganssle pganssle reopened this Jul 20, 2020
@pganssle
Copy link
Member Author

@jaraco OK, so I like the idea of moving the distutils_patch module into _distutils_importer, but two things:

  1. Even though it involves duplication of the logic, I think we should leave the feature flag in the .pth file. The logic is simple enough and waiting until after the module is imported will add more overhead to every startup in an environment where setuptools is installed, even if they never use it. Even assuming that's negligible, weird environments abound and this is unconditionally running a bunch of our code on every start-up in a way that can't be turned off. I think it would be prudent to start off slower.
  2. I'm thinking it might be preferable to keep _distutils_importer to a single file rather than a package of 3 files. I don't think we get a lot out of having it in 3 separate files where importing the files themselves has different actions, and I think that it will be easier for people who are going to need to modify this as part of their own environment-specific deployments if it were a single file easy to vendor and patch (or to copy directly into a sitecustomize.py or something, for example).

What do you think?

@jaraco
Copy link
Member

jaraco commented Jul 26, 2020

Looks like the changes you made make it so that you either get the distutils patch via the .pth file or you don't get it at all.

That wasn't my intention. I would have expected this line to cause the hack to be invoked unconditionally. And I confirm it works as intended:

setuptools distutils_import_hack $ git-id
1e53a2c1
setuptools distutils_import_hack $ env SETUPTOOLS_USE_DISTUTILS=local .tox/python/bin/python -c "import distutils; print(distutils.__file__)"
/Users/jaraco/code/public/pypa/setuptools/setuptools/_distutils/__init__.py

Can you share where the behavior misses your expectation?

1. we should leave the feature flag in the `.pth` file.

That's fair. I'll do that.

2. keep `_distutils_importer` to a single file rather than a package of 3 files

My main reason for doing this with _distutils_importer.override is to isolate the behavior-on-import and to minimize the impact on setuptools.__init__. If the behavior isn't implied by the import, then the importer also needs to call some function, and you can't do that in setuptools.__init__ without violating linter rules over many subsequent lines, so you either need to noqa all of those lines or disable that linter for the file. As it is, only one line in setuptools.__init__ is affected to engage the behavior. If there's a way to keep that hook as one-line, I'm open to any organization of the new package.

I don't know any way to do that without also implicitly enacting that behavior when triggered from the .pth file, which is what I was trying to avoid. In other words, these are the requirements:

  • .pth file only installs the hook.
  • more aggressive override is only invoked on import setuptools.
  • impact on setuptools.__init__ is one line of execution.

I do think it makes sense to consolidate all of the behavior into _distutils_importer.__init__ and minimize the behavior in the behavior-on-import modules.

I'll work on another draft, but if you have ideas on how to eliminate .override, please share.

@pganssle
Copy link
Member Author

That wasn't my intention. I would have expected this line to cause the hack to be invoked unconditionally. And I confirm it works as intended:

...

Can you share where the behavior misses your expectation?

Sorry, I did figure out that it worked as intended after I left that comment, then I guess I forgot that I had actually hit "send".

If the behavior isn't implied by the import, then the importer also needs to call some function, and you can't do that in setuptools.__init__ without violating linter rules over many subsequent lines, so you either need to noqa all of those lines or disable that linter for the file. As it is, only one line in setuptools.__init__ is affected to engage the behavior. If there's a way to keep that hook as one-line, I'm open to any organization of the new package.

Can we do a multiline disable? With pylint you can do # pylint: disable=rule and # pylint: enable=rule on another line. I don't remember us having a linter, but it seems like someone set up flake8 at some point? Disabling the "import not at top" error on the whole file also doesn't bother me too much: I'm not too worried about "import not at top" sneaking into setuptools.__init__, and I feel like having stuff happen implicitly as part of an import is worse than explicitly calling the function (putting aside the fact that the thing we're doing is making import setuptools implicitly do this thing 😛).

We could also presumably switch to pylint or something that does allow block-disabling of rules.

At the end of the day, I don't like the idea of contorting our code just to satisfy linters, I find linters useful until they get in my way, at which point I ruthlessly disable them. I think there are probably real benefits to keeping this hack in a single file, and only minor benefits to keeping the linter rule in place on that file.

@jaraco
Copy link
Member

jaraco commented Jul 26, 2020

Can we do a multiline disable?

Yes. I'm not opposed to any number of lines that appear in one contiguous block, so a block-based enable/disable would be fine. What I really want to avoid is more touch points at different points in the code.

it seems like someone set up flake8 at some point?

It may have been me. I use pytest-flake8 across a lot of my projects and consider it a best practice.

Disabling the "import not at top" error on the whole file also doesn't bother me too much: I'm not too worried about "import not at top" sneaking into setuptools.__init__,

I'd be okay with disabling it for the file, but I'm pretty sure that can't happen without a discontinuous line or a line in another file, adding another touch point (and cross-file dependency). I does not appear as if flake8 has per-file ignores that can be declared inline.

We could also presumably switch to pylint or something that does allow block-disabling of rules.

Switching to pylint would require a divergence from the practices I follow in other projects or would require transitioning those projects to pylint. Either of those approaches greatly expands the scope of this effort and creates a dependency I'd like to avoid. I'd be willing to consider such a transition separately, and if it can be accepted, utilize it to remove the extraneous import.

In df0adbd, I've reduced the override to just its bare minimum. Hopefully that minimizes the issue and makes a future transition straightforward.

At the end of the day, I don't like the idea of contorting our code just to satisfy linters, I find linters useful until they get in my way, at which point I ruthlessly disable them. I think there are probably real benefits to keeping this hack in a single file, and only minor benefits to keeping the linter rule in place on that file.

In my experience, having linters is of minimal value for individual projects, but serves its main value in settling what would otherwise be bikeshedding over subjective preferences. Linters have cost me much more time than they've ever saved, but at least they allow me to offload those decisions. I do my best to honor the preferences of the linters, even when they violate my sensibilities.

At this point, the hack 99% in a single file with only one small shim for override. I think that's good enough to proceed and we can iterate more if appropriate.

@pganssle
Copy link
Member Author

At this point, the hack 99% in a single file with only one small shim for override. I think that's good enough to proceed and we can iterate more if appropriate.

Yeah, that's a fair compromise. I'll start another issue for the pytest-flake8 thing, because when working on this (or something else recently) it did cause me some troubles.

One last thing, possibly bikeshedding, but should we rename this from _distutils_hack to _setuptools_distutils_hack or something like that? I don't think too many people will see this, but it occurs to me that people are generally not used to PyPA packages installing multiple packages, and they may not realize where _distutils_hack is coming from. We should also (either in addition or instead) give more context about why this module exists in the docstring.

Happy to do those changes myself if you want.

@pganssle pganssle marked this pull request as ready for review July 26, 2020 16:52
@pganssle
Copy link
Member Author

Also this seems like a lot of commits, we should probably rewrite the history a bit. If you don't want to, I'm happy to merge all the related changes into 2-3 commits.

Copy link
Member

@jaraco jaraco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. Make whatever edits you wish to the docstring, module name, and commit history, add a changelog entry, and let's get this merged. Thanks!

@pganssle
Copy link
Member Author

Hm, I found a major problem here. Apparently we've been testing setuptools against the version in the local directory using an editable install.

Since editable installs don't install the .pth file, that makes it impossible for us to actually test this behavior. During my rebase I accidentally made a change that should have broken the tests, but didn't, which means this behavior is untested, something I'm very uncomfortable with.

I think before we merge this we should refactor the tests to make it possible to test when not using an editable installation, and ensuring that we test as installed.

@pganssle
Copy link
Member Author

To clarify, I think we should probably do that in a separate PR, since this one is already a bit complicated and I've got a half-refactored history sitting around semi-blocked on this.

@jaraco
Copy link
Member

jaraco commented Jul 26, 2020

I did one-off test the behavior using pip-run in #2260 (comment) and repeated that test with the latest code. In my opinion, that's sufficient, though I wouldn't object to there being a test for the installed package.

This puts non-distutils imports first and removes one unused import.
@pganssle
Copy link
Member Author

I did one-off test the behavior using pip-run in #2260 (comment) and repeated that test with the latest code. In my opinion, that's sufficient, though I wouldn't object to there being a test for the installed package.

I don't think that is sufficient. Redistributors also need to be able to run the tests, and we need to be confident in our ability to accept PRs from contributors who are going to make changes to this code. We also need confidence that it works on all supported platforms. Relying on one-off testing will dramatically slow our velocity on probably one of the most trickiest features we've released in a long time.

It's unfortunate that our tech debt is coming due when we'd like to get these changes out, but when you're in a hole it's probably best to stop digging.

@jaraco
Copy link
Member

jaraco commented Aug 2, 2020

I don't think that is sufficient. Redistributors also need to be able to run the tests, and we need to be confident in our ability to accept PRs from contributors who are going to make changes to this code. We also need confidence that it works on all supported platforms. Relying on one-off testing will dramatically slow our velocity on probably one of the most trickiest features we've released in a long time.

I started writing tests for this functionality, but when I did, I began running into bugs in pytest-virtualenv and ran out of steam before I found a solution. More importantly, the behavior that's most essential is that setuptools runs as intended in a system site-packages environment, functionality that's never been part of the test suite.

I'm open to more tests, but I'm not excited about the prospect, given that once this functionality is working as intended and the bugs have been worked through, I don't expect this code to change much and expect it to be deprecated as soon as possible. In other words, it's not inherent to the design, but an artifact of a transition effort. That's why I'm okay with releasing it as best-effort and incidentally tested.

That said, I'd welcome tests for it - I'm just not sure it's worth the investment or the delay in making the functionality available.

By the way, what is the status of the rewrite? It looks like tests are failing everywhere. Do you have plans to investigate the emergent issues?

@pganssle
Copy link
Member Author

pganssle commented Aug 3, 2020

I started writing tests for this functionality, but when I did, I began running into bugs in pytest-virtualenv and ran out of steam before I found a solution. More importantly, the behavior that's most essential is that setuptools runs as intended in a system site-packages environment, functionality that's never been part of the test suite.

Yes, that's the "technical debt" that I was referring to. We should have never been testing it in any other way, and now we have a major, high-importance and high-profile feature that was rolled out without tests which breaks many things and where it's very difficult to iterate on solutions because we can't even run the test suite in a way where the change takes effect.

The problem here is not that I'm requesting an unreasonable standard, the problem is that we never had this standard in the first place, and that we rolled out the distutils adoption without it in place. If this is causing major problems, we should roll back the distutils adoption until we can get it tested properly (including actually running the distutils test suite!).

I'm open to more tests, but I'm not excited about the prospect, given that once this functionality is working as intended and the bugs have been worked through, I don't expect this code to change much and expect it to be deprecated as soon as possible. In other words, it's not inherent to the design, but an artifact of a transition effort. That's why I'm okay with releasing it as best-effort and incidentally tested.

The biggest issues here are that we need tests to be able to make changes with confidence. The issue with sys.meta_path was an artifact of the history rewriting that got completely overlooked because we cant use the test suite to test this! When inevitably there are a half dozen bug reports for the next release of this, we'll need to be able to make changes in such a way that none of the existing functionality breaks on any of our supported platforms. It's true that almost all of our tests (and indeed almost all the tests in all projects) tend to stop breaking once the changes stabilize, but we keep them around anyway to prevent regressions and to give ourselves confidence that nothing fundamental about the way the program works is changing without our knowing about it.

That said, I'd welcome tests for it - I'm just not sure it's worth the investment or the delay in making the functionality available.

Again, I strongly disagree. To be honest, I think almost all consumers of this project don't care whether or not they use the distutils from the standard library or from setuptools. From a user-centered point of view, this is the kind of change that we should be most cautious about, because any breakage we introduce is a cost that we're asking our users to bear in exchange for making it easier to maintain distutils and setuptools and to give users a more consistent experience. Acting quickly without testing just to get it done is the opposite of what we should be doing because it simultaneously makes it harder for us to maintain the project and makes it more fragile for end users. No one wins.

By the way, what is the status of the rewrite? It looks like tests are failing everywhere. Do you have plans to investigate the emergent issues?

I think this is blocked on us getting the test suite working, which I don't have much time for right now. If the current situation is untenable my opinion is that we should roll back the distutils adoption and try to fix it when we have our ducks in a row.

@jaraco
Copy link
Member

jaraco commented Aug 6, 2020

I can't help but feel that if we back out the distutils patch, we're allowing perfect to be the enemy of good.

There are these mitigating factors I see:

  • The bulk of the behavior has tests (distutils has working tests in CPython and pypa/distutils) and setuptools has its own tests.
  • Although Setuptools previously integrated (heavily) with distutils, it has not had integration tests.
  • Integration of distutils reduces the complexity by reducing the number of distutils versions that each Setuptools release must maintain compatibility with.
  • This change is ~120 lines of code that's incidental, transient, and not part of the essential design or user experience.
  • The risk from this change is minimized through manual tests and through rapid releases whereby unexpected degradation can be readily reverted or repaired.

I agree this decision isn't easy, but I just can't see backing out the change or leaving it degraded. We've already backed out the change to be opt-in and validated that the opt-in behavior is working as intended. If we don't have the resources to produce the desired outcome, we should eek by with what we can achieve. I'm all for rigor, and we can ratchet up the rigor in subsequent changes (presumably before attempting major refactoring). There are lots of steps ahead of us, including at least reconciling the monkey patches, identifying and presenting a minimal distutils interface, supplanting and removing the stdlib implementation, and eventually deprecating the distutils interface altogether.

To that end and with some regret, I'll proceed with the draft where the tests were passing (plus the meta_path patch cherry-picked). I've pushed that to the distutils-import-hack branch and will submit that as a different PR. I realize it doesn't represent a clean history of how we wish we'd approached it, but it does appear to achieve the goal and address the issue.

@pganssle
Copy link
Member Author

pganssle commented Aug 6, 2020

I strongly disagree with this decision. Until this project is actually well-tested I don't think we can consider rolling out a change of this magnitude. I appreciate the desire to make forward progress but moving distutils from a project with excellent testing infrastructure (CPython) to a completely untested project is a major regression. Particularly when we are already getting a reputation for churn.

Switching to the old pre-rebase version is even worse, because the history isn't even clean. Very unfortunate. I will probably personally recommend to everyone that they pin setuptools to a pre-distutuls-adoption version indefinitely at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants