Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NumPy compatibility issue (further) #956

Closed
qiyunzhu opened this issue Apr 11, 2024 · 12 comments · Fixed by #962
Closed

NumPy compatibility issue (further) #956

qiyunzhu opened this issue Apr 11, 2024 · 12 comments · Fixed by #962

Comments

@qiyunzhu
Copy link
Contributor

Hi @wasade I was able to set up a development environment with NumPy 2.0, following the instruction provided in #955 . Now as I run python -m unittest, there are two errors. I think you may be more familiar with these. Would you like to address them?

======================================================================
FAIL: test_parse_biom_table (biom.tests.test_parse.ParseTests.test_parse_biom_table)
tests for parse_biom_table when we do not have h5py
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/me/biom-format/biom/tests/test_parse.py", line 341, in test_parse_biom_table
    self.assertEqual(t, t_json)
AssertionError: 417 x 9 <class 'biom.table.Table'> with 631 nonzero entries (16% dense) != 0 x 0 <class 'biom.table.Table'> with 0 nonzero entries (0% dense)

======================================================================
FAIL: test_parse_biom_table_with_hdf5 (biom.tests.test_parse.ParseTests.test_parse_biom_table_with_hdf5)
tests for parse_biom_table when we have h5py
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/me/biom-format/biom/tests/test_parse.py", line 378, in test_parse_biom_table_with_hdf5
    self.assertEqual(t, t_json)
AssertionError: 5 x 6 <class 'biom.table.Table'> with 15 nonzero entries (50% dense) != 0 x 0 <class 'biom.table.Table'> with 0 nonzero entries (0% dense)
@qiyunzhu
Copy link
Contributor Author

qiyunzhu commented May 1, 2024

@wasade It will be very nice if this can also be resolved prior to the new release. You are probably the most familiar with the issue. But if necessary, I can motivate a student to help you on it.

@wasade wasade mentioned this issue May 2, 2024
@qiyunzhu
Copy link
Contributor Author

@wasade I tried to pip install biom-format 2.1.16 along with numpy 2.0rc1 but it ran into issues. Procedures to reproduce:

> conda create -y -n biom-np2 -c conda-forge python
> conda activate biom-np2
> pip install numpy==2.0.0rc1
...
Successfully installed numpy-2.0.0rc1
> echo "import numpy; print(numpy.__version__)" | python
2.0.0rc1
> pip install biom-format
...
Successfully installed biom-format-2.1.16 click-8.1.7 h5py-3.11.0 pandas-2.2.2 python-dateutil-2.9.0.post0 pytz-2024.1 scipy-1.13.0 six-1.16.0 tzdata-2024.1
> echo "import biom; print(biom.__version__)" | python
...
File "biom/_filter.pyx", line 1, in init biom._filter
ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

After some testing, I found that the following method works:

> conda create -y -n biom-np2 -c conda-forge python
> conda activate biom-np2
> pip cache purge
> pip install numpy==2.0.0rc1 cython
> pip install biom-format --no-build-isolation
> echo "import biom; print(biom.__version__)" | python
2.1.16

Some notes:

  • The most critical part is --no-build-isolation. Otherwise biom-format will be built using the NumPy 1.x logic.
  • It is important to install Cython prior to biom-format. Otherwise there will be an error ModuleNotFoundError: No module named 'Cython'.
  • pip cache purge helps to ensure reproducibility.

Comment:

  • The necessity of --no-build-isolation may not be favorable. This behavior is not seen in other dependencies of biom-format. Perhaps this can be worked on within the biom-format packaging mechanism?

@wasade
Copy link
Member

wasade commented May 13, 2024

The use of --no-build-isolation is consistent though with the guidance from @rgommers?

I would welcome a PR to resolve the Cython installation step

@rgommers
Copy link

It's because of the numpy requirement in https://github.com/biocore/biom-format/blob/master/pyproject.toml#L2. The advised requirement here is numpy>=2.0.0rc1,<3, see: numpy/numpy#24300 (comment). That will make pip install biom-format work as expected. It's then still possible to buid against numpy 1.x through --no-build-isolation, but the default is then such that you'll produce wheels that work with 1.x and 2.x.

@wasade
Copy link
Member

wasade commented May 13, 2024

Ah okay, thank you!

@qiyunzhu
Copy link
Contributor Author

I guess once NumPy 2.0 is officially released, the current package (biom-format) will automatically build against 2.0 when it is being installed, so there is nothing we need to do to this package. Is my understanding correct? @rgommers

@rgommers
Copy link

I guess once NumPy 2.0 is officially released, the current package (biom-format) will automatically build against 2.0 when it is being installed

Indeed, it will.

so there is nothing we need to do to this package.

If you've worked around the problem in scikit-bio and this package isn't used much outside of that, you could leave it as is indeed. There is an annoying failure modes though, which is that any user who is building a release before numpy 2.0 is out will get a wheel built that doesn't work with 2.0. And those are cached by pip, so they may see issues at runtime since pip will pick a wheel from its cache rather than rebuild. If you want to avoid that, adding numpy>=2.0.0rc1,<3 is still better.

@qiyunzhu
Copy link
Contributor Author

@rgommers Thanks for the clarification!

Is my understanding correct that if biom-format or scikit-bio is built against numpy>=2.0.0rc1, it will be compatible with both numpy 1.x and numpy 2.x. Therefore, even in an environment with numpy 1.x pre-installed, one can install scikit-bio via a simple pip install scikt-bio, and that doesn't trigger a mandatory upgrade of numpy 1.x into 2.x?

@rgommers
Copy link

Yes indeed, that is correct.

@jakirkham
Copy link

FWIW it looks like biom-format has been migrated for NumPy 2: conda-forge/biom-format-feedstock#30

scikit-bio depends on hdmedians, which is pending migration ( conda-forge/hdmedians-feedstock#20 ). Once that is in there should be a new bot PR for scikit-bio

@wasade
Copy link
Member

wasade commented May 31, 2024

Thanks!!

@qiyunzhu
Copy link
Contributor Author

@wasade @jakirkham We recently removed hdmedians from scikit-bio's dependencies. It won't be an issue in the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants