Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a system dependency for numpy #12891

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

rgommers
Copy link
Contributor

As a follow-up to gh-12799, and addresses gh-9598.

Opening as draft for now, because the key issue (how to get at the target Python installation) required a new keyword for dependency, which I suspect is going to lead to a bit of discussion. To get at the right numpy package, the install found by import('python').find_installation(...) must be used. There may even be several such installation objects floating around in a user's meson.build files. It doesn't seem possible to get at those without the user passing in the target interpreter. I went with this API:

py = import('python').find_installation(pure: false)

dependency('numpy', interpreter: py)

Alternatives to achieve this I considered are:

  • dependency('numpy', interpreter: py.path()) # not as nice, but does keep keyword value as a string
  • dependency('numpy', dependencies: py_dep)
  • reusing an existing keyword like modules or components (but nothing really seems to fit here)

Of course the last alternative is deciding that any of these solutions are not acceptable, and letting users wait until they no longer need NumPy 1.x (1.5 to 3 years from now for widely used packages). Until that time they'd have to use this very ugly code block to support 1.x:

https://github.com/PyWavelets/pywt/blob/e69b126c096868b0ea7650d38ed11cd95a9dc182/pywt/_extensions/meson.build#L15-L39

This new keyword is in principle not only applicable to numpy, but to system dependencies for other Python packages with C APIs as well. There aren't too many of those though, so I'm not sure how compelling an argument that is.

This adds an `interpreter` keyword to `dependency()`, which is
needed because there doesn't seem to be another way to get at
the result of `import('python').find_installation(...)`. We need
to access the installed `numpy` module into that Python installation.
It's also possible that a user has multiple Python installations
detection in their meson.build files; we can't know for which one
they're looking for `numpy` (this would apply to any other Python
package dependency).

In order to use the system dependency, it is necessary to use it like:
```meson
py = import('python').find_installation(pure: false)
dependency('numpy', interpreter: py)
```

Alternatives to achieve this I considered are:

- `dependency('numpy', interpreter: py.path())`  # not as nice, but keeps keyword value as a string
- `dependency('numpy', dependencies: py_dep)`

[skip ci]
@jpakkane
Copy link
Member

Would it be possible (and feasible) to make just this work instead:

dep = dependency('numpy')

and have it do all the needed lookups behind the scenes? Or possibly something lik

dep = dependency('numpy'¨, method: 'super_magic')

If the boilerplate is always the same, then this would save users from having to type it out every time.

@rgommers
Copy link
Contributor Author

Would it be possible (and feasible) to make just this work instead:

It'd be very desirable, however I couldn't find a way. To illustrate the problem in another way: imagine a Linux distro with a default Python version of 3.10 (executables python, python3, python3.10) and also shipping Python 3.11 (binary (python3.11). This is pretty common. The meson executable is installed with the default Python 3.10. If you now want to build a Python package that depends on python3.11 and uses numpy, you'll probably see this (also note that meson-python always passes --native-file):

meson setup build --native-file native-file.ini

with native-file.ini containing:

[binaries]
python = '/path/to/python3.11'

In such a case, dependency('numpy') would need to detect the Python installed for Python 3.11 and not 3.10. It'd be great if someone has an idea of how to achieve this. As far as I understand the Meson internals here, it would require maintaining internal state in the interpreter, so that from the NumPySystemDependency implementation it is possible to go look up the result of a find_installation call that has happened already. If such internal state exists, I missed it.

@dnicolodi
Copy link
Member

How is this supposed to work with the numpy-config based dependency lookup which has already been merged? I expect distributions to ship the numpy-config executable on the PATH regardless of the Python interpreter for which Numpy is installed, or to do not ship it at all. How does Meson know that the numpy-config that it found is for the right Python? Sorry, I haven't had a look at the actual implementation.

@rgommers
Copy link
Contributor Author

That is a very good question @dnicolodi. I had only looked at it cursory, since I copied the logic exactly from pybind11 and there pybind11-config "just works" AFAIK. I think the answer is:

  1. distros ship multiple Python interpreters but not multiple versions of packages like pybind11 or numpy. so there is only one pybind11-config and yes it's on the PATH
  2. for venvs based on a non-default interpreter, detection relies on PATH precedence: /usr/bin/ will be the last entry on the path and /path/to/user-created/venv/bin/ will be the first entry.
    • if the venv (or conda env, or whatever) does not have pybind11 installed, then the wrong package will get picked up. No idea how often that happens; if it does happen you may pick up headers for the wrong version of pybind11/numpy (which may still work in most cases, but of course it can go wrong).
    • it looks to me like this PATH precedence does rely on Meson being implemented

For Python tools the generic answer of the "multiple binaries with the same name" problem is to invoke them like python -m pip / python -m pytest. But for a pkgname-config tool that is not what Meson does; nor can it, because it cannot assume (in the generic ConfigTool class at least) that the configtool is written in Python. This could be added as new functionality to ConfigTool, but at that point it does have exactly the same problem as the SystemDependency here: it does need to access the find_installation() result.

I just created a dummy project like this:

project('try-configtool', 'c')
dependency('pybind11')

and verified that ConfigTool does work like I describe above:

  • If meson and pybind11 are installed in the same venv, pybind11 is found because /path/to/venv/bin/ is the first entry in PATH
  • If I use a meson binary outside of the venv, detection of pybind11 still works
  • If I remove the /path/to/venv/bin/ entry from PATH, detection fails.
  • Conclusion: how this works now does not rely on Meson being implemented in Python, but does rely on venv (or conda/spack/nix/etc. env) activation - which is what adds the needed PATH entry - so one can only target the interpreter in an activated environment.

Relying on an activated env seems okay, so pybind11-config/numpy-config do work. To get the equivalent system dependency, we do need to know what interpreter to use.

There's another potential failure mode, which seems much more likely: pkg-config is tried first, so if /usr/share/pkgconfig/pybind11.pc / /usr/share/pkgconfig/numpy.pc exist, then venv activation doesn't do a thing: the global/distro install is always going to get found first. That is bad - I think we need to invert the method ordering here, trying config-tool first. pkg-config support is helpful for cross builds, but it's probably the last method that should be tried for native builds.

@eli-schwartz
Copy link
Member

Note that pybind11 is a C++ project. Not a python one. It doesn't matter which python version is used for the pybind11-config script, or indeed if there is one in the first place -- distros probably want to install pybind11 in global mode to /usr/include and /usr/share/pkgconfig.

Numpy does depend on which version of python it is associated with due to codegen, right?

@rgommers
Copy link
Contributor Author

Numpy does depend on which version of python it is associated with due to codegen, right?

It doesn't; the couple of headers that are created by codegen can have differences for the exact same numpy version due to compiler toolchain and other environment features (things like #mesondefine HAVE_XLOCALE_H) though.

The main problem equally applies to pybind11 here though: picking up a completely different version. We probably haven't noticed yet because pybind11 hasn't changed much recently. As soon as the first version is released with support for numpy 2.0 though, I think we'll have problems whenever there's an older pybind11 version installed by a distro package manager. A lower bound in pyproject.toml like "pybind11>=2.12.0" is going to get ignored unless that same version spec is also added to dependency('pybind11').

@eli-schwartz
Copy link
Member

Sure, and that's fine. Please add the constraint to dependency(), it is apropos to do so.

There are several reasons it is relevant, including the fact that distros building with:

  • python -m build --no-isolation --skip-dependency-check --wheel
  • gpep517 build-wheel
    aren't going to see dependency bounds from pyproject.toml. This can be fairly important because pyproject.toml build-requires frequently has dependencies which make sense in the context of pip install or cibuildwheel but not for distro building.

Sometimes that is pinned dependencies, sometimes that is additional dependencies (stuff like, but not limited to, accidentally adding a dependency on https://pypi.org/project/ninja/).

@rgommers
Copy link
Contributor Author

Please add the constraint to dependency(), it is apropos to do so.

That is fair, yes it's always a good idea. That said, it'll still be potentially confusing to ignore a pybind11 installed in the environment you're building in because it's shadowed by a global one. E.g., you're trying to test a patched version or a newer version. If Python packaging were saner and had proper pkg-config support, this wouldn't happen. But as it stands, I still think it'd be healthy to try the config tool first, and pkg-config second. WDYT?

@eli-schwartz
Copy link
Member

This is fundamentally a matter of layering build environments. My understanding was that you'd suggested automatically adding to the beginning of $PKG_CONFIG_PATH, the output of:

import glob, os
paths = glob.glob(os.path.join(env_site_packages, '*/share/pkgconfig/'))

@rgommers
Copy link
Contributor Author

This is fundamentally a matter of layering build environments.

agreed

My understanding was that you'd suggested automatically adding to the beginning of $PKG_CONFIG_PATH, the output of:

Either that or trying config-pybind11 first. I like your suggestion better I think, it should work and hopefully generalize nicely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants