-
-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide mapping from "Python packages" to "distribution packages" #131
Comments
In GitLab by @8day on Oct 16, 2020, 13:33 changed the description |
Thanks for the report @8day. Indeed, when I started implementing this package, my instinct was the same as yours, that it should be straightforward for a (Python) package to be able to resolve itself to its distribution, or conversely to resolve from a distribution all the packages it exposes. But soon after delving into the implementation, I abandoned the idea of attempting to solve the problem of resolving distributions to their packages and vice-versa and instead focused on enabling the basic use-case of resolving metadata for a (usually installed) distribution. Therefore, a distribution like Note also that packages that only expose a module are also affected ( In an early design, I had proposed that Python packages that wished to declare their distribution package could do so with something like: __distribution__ = 'PyYAML' Or for something like __distribution__ = 'setuptools' But as I mentioned above, I decided the value this would add was small compared to the confusion. I do agree it would be nice to have a reliable protocol to determine packages/modules for a distribution and a relevant distributions for a module/package. Note that with namespace packages, it's a many-to-many relationship, as module: This effort isn't something this project has plans to tackle, but if you have interest in driving the design, consensus, and implementation, I'd be willing to advise on the process. I'd start by providing more background on what use-cases are unmet by the current implementation. Can you elaborate on what use-cases you encountered that inspired you to write the report? |
I think I faced this problem while designing dependency checker. Users were forced to specify entire Distribution Package as a dependency when it was an Import Package that was required, which was confusing. I.e., difference in "import name"/"name in code" and "install name" could require analysis of install files. Note that this mattered only because this had to be a fully automated task, and not something that has human oversight. Can't get this out of my head: it seems like all of this is about "Import Package metadata". E.g., |
I think we are trying to solve the problem that once have been solved. I've re-read PEPs for metadata, and finally saw why I thought about (from PEP 345 Provides-Dist)
As can be seen,
Note that I.e., what must be done is for package managers to support all of this. Thus, probably there's even no need for a new protocol/PEP: all of this just has to be implemented by I think metadata format requires proper analysis to clear up lots of shady moments and set it on its original path. Edit: Edit 2: P.S. Although |
I think you may be mistaken on this point. The way I interpret the text above, it's not a "Import Package" but a "Distribution Package" (aka distutils project):
So the values of
I suspect that What's more important, however, is that Most importantly, there's nothing about the Thinking about your use-case, providing for the user a way to download the relevant distribution packages based on the Python packages they wish to import may prove to be a lot more challenging than just exposing metadata in the package. You'll also need support in the index to expose that metadata in a searchable way. I guess if you're only looking to validate dependencies in a local environment, it might be possible to do without affecting the index. Still, I'm not sure you can achieve what you need. Consider, for example, the
You may be on to something with
There are some gaps there in that items in Still, it does seem as if most of what you need is exposed through that (unofficial) metadata. Note, I'm pretty sure that Given that Python has deprecated the declaration of namespace packages, you may have difficulty disentangling those names. Does that help? |
Considering that I was mistaken about Speaking about
That's exactly what that "dependency checker" were supposed to do: just check if a package was already installed. This was a part of a feature where build backend's extension was used only when PEP 508 dependency was satisfied. E.g., you may want to run different set of commands depending on the platform.
In my case? Yes. In general? Kind of. In your first reply you proved that with the namespace packages in the picture, this is not the kind of problem that I guess you can close this issue. |
Sounds good. Thanks for the hard consideration. I did find that it's possible to get all of the distribution names for a given top-level package:
That also revealed an issue with |
To answer the questions we have, we need data on libraries installed in the environment, not packages that are imported. importlib_metadata gives us access to the RECORDS file (https://www.python.org/dev/peps/pep-0376/#record) for every package, and we build a reverse mapping of package name -> distribution once. Distribution (I am calling them libraries) names are then used. Since distributions are 'installed' specifically, they already ignore modules in the standard library and any local user written modules. The dependency on the stdlib_list library can be removed. All metric names have been changed to talk about libraries, not packages. The word 'package' is so overloaded, and nobody knows anything about 'distributions'. https://packaging.python.org/glossary/ is somewhat helpful. I will now try to use just modules (something that can be imported - since our source is sys.modules) and "library" (what is installed with pip or conda - aka a distribution). Despite what python/importlib_metadata#131 says, the package_distributions function in importlib_metadata relies on the undocumented `top_level.txt` file, and does not work with anything not using setuptools. So we go through all the RECORDs ourselves. Added some unit tests, and refactored some functions to make them easier to test. Import-time side effects are definitely harder to test, so I now require an explicit setup function call. This makes testing much easier, and is also more intuitive. Bump version number
In GitLab by @8day on Oct 16, 2020, 11:12
Apart from
Distribution.from_name()
there must existDistribution.from_package_name()
. E.g.,pkg_resources
fromsetuptools
can't be found withimportlib_metadata.Distribution.from_name('pkg_resources')
. Another example isPyYAML
(dist name) that containsyaml
(package name).Most likely for this to work dists will need
Provides-Dist: {dist}:{pkg}
to be defined in their metadata. Considering that it seems that nobody uses a proper solution, ATM can be implemented as a hack: check if*.dist-info/top_level.txt
exists and*.dist-info/INSTALLER
containspip
, then use contents of*.dist-info/top_level.txt
to list "import packages" contained by "distribution package".This will require
Package
to be added, which will complicate things quite a bit: e.g.,Package.files()
will have to return files stored within package, and not entire distribution. Also, this will require adjustment of terminology, seeing as ATM "package" is more or less the same as "distribution package".Note that this will require reading of metadata of all found dists, which will be extremely inefficient.
All of this can be avoided by a switch to another, mono-package dist format and preferably metadata format, but that's a topic for another discussion...
Edit:
I could sware I've read about notation like
Provides-Dist: {dist}:{pkg}
in one of the PEPs, but can't find any sources...The text was updated successfully, but these errors were encountered: