Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build packages compatible with multiple NumPy version #32

Merged
merged 2 commits into from
Jul 15, 2017

Conversation

jjhelmus
Copy link
Contributor

@jjhelmus jjhelmus commented Jul 8, 2017

Build using a older version of NumPy to create packages which are compatible with multiple NumPy versions.

NumPy maintains backwards compatibility and therefore extensions built with a
older version of NumPy will work with newer versions. To take advantage of this
and build a single pandas package that will work with multiple version of
NumPy, the oldest version of NumPy provided for a particular Python
version (Python 3.5 - 1.9 Python 3.6 - 1.11) or in the case of Python 2.7
the minimum version required by Pandas (1.7) is pinned as a build requirement.
The run requirement is then specified as any NumPy version greater than or
equal to this pinned version.
@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@jjhelmus
Copy link
Contributor Author

jjhelmus commented Jul 8, 2017

Stack Overflow has a nice writeup on NumPy's backwards compatibility This same build with an older NumPy trick is being used to build whl's for Python projects that make use of the NumPy C API.

@ocefpaf
Copy link
Member

ocefpaf commented Jul 8, 2017

That is interesting. Should we adopt this policy everywhere?

@jjhelmus
Copy link
Contributor Author

jjhelmus commented Jul 8, 2017

That is interesting. Should we adopt this policy everywhere?

If it works, yes! More testing and a chat with some of the NumPy team at SciPy would probably help clarify a few things about the process.

@jjhelmus
Copy link
Contributor Author

jjhelmus commented Jul 8, 2017

Should point out that besides cutting down on the number of packages that need to be built, this technique also means that the release of a new NumPy version does not require new builds of the various downstream packages which has been a pain point in the past.

@ocefpaf
Copy link
Member

ocefpaf commented Jul 8, 2017

More testing

I guess that, with this policy, it would be nice if we could run the whole test suite to help catch any accidental breakage.

@TomAugspurger
Copy link
Contributor

More testing and a chat with some of the NumPy team at SciPy would probably help clarify a few things about the process

I'll be at SciPy next week. Would you recommend waiting until then to merge this?

Another question that isn't relevant yet. If pandas decides to require a newer NumPy would we need to update the build matrix?

@ocefpaf
Copy link
Member

ocefpaf commented Jul 8, 2017

Another question that isn't relevant yet. If pandas decides to require a newer NumPy would we need to update the build matrix?

Nope. Only the recipe. That is the beauty of this approach.

@jjhelmus
Copy link
Contributor Author

I tested this numpy technique with bottleneck and numexpr. There were no issues importing or running the unit tests for these packages with all the combinations of Python and NumPy I tried. No warnings or other oddities.

With these tests and having talked with a number of folks at SciPy about the approach, I feel confident that this approach is safe for future packaging.

@ocefpaf ocefpaf merged commit b4df254 into conda-forge:master Jul 15, 2017
- numpy x.x
- numpy >=1.7 # [py27]
- numpy >=1.9 # [py35]
- numpy >=1.11 # [py36]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have different NumPy versions for different Python versions? What would be wrong with requiring numpy >=1.7 for all Python versions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably because numpy 1.7 does not support py35 or py36

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, though admittedly there won't be a numpy 1.7 build for python 3.5 or 3.6 then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if there ever was a numpy 1.7 for python 3.5 or 3.6 the package built using this recipe would NOT work with them because numpy does not guarantee forward compatibility.

These lines properly describe the numpy requirement for the various Python versions. I don't think depending on the repository not having certain versions of a package is something we want to depend upon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the impression by looking at the release notes that not much has changed in the C API since NumPy 1.7 or at least what has changed would not present an issue when building. Though there may be issues at runtime for exceptionally old code.

Copy link
Member

@jakirkham jakirkham Jul 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that does actually concern me a bit with this strategy is that if we have BLAS dependent packages that use NumPy/SciPy as interface to the BLAS, reaching back to such old versions will give us packages that are not compatible with how BLAS is used in conda-forge currently and thus mostly like the wrong BLAS. Given this, would it make more sense to restrict everything to say NumPy 1.10 or 1.11 when NumPy was added to conda-forge with BLAS support?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to restrict everything to 1.10 or 1.11. We've dropped numpy <= 1.10 in the old approach anyway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building against a "newish" numpy does not create a package which works with older numpy versions. It is necessary to specify the run requirement numpy be strictly greater than or equal to the build version. For example, installing the pandas from defaults built against numpy 1.12 and then force installing numpy 1.9 results in the following import time error:

$ python -c "import pandas"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/jhelmus/anaconda/envs/test/lib/python2.7/site-packages/pandas/__init__.py", line 26, in <module>
    from pandas._libs import (hashtable as _hashtable,
  File "/Users/jhelmus/anaconda/envs/test/lib/python2.7/site-packages/pandas/_libs/__init__.py", line 3, in <module>
    from .tslib import iNaT, NaT, Timestamp, Timedelta, OutOfBoundsDatetime
  File "pandas/_libs/src/numpy.pxd", line 157, in init pandas._libs.tslib (pandas/_libs/tslib.c:117317)
ValueError: numpy.dtype has the wrong size, try recompiling. Expected 88, got 96

Although conda-forge does not provide numpy version less than 1.10 they are available in defaults and I believe should be supported. I am willing to submit PRs to get conda-forge version of these older numpy version if that is desired.

As for the BLAS issue, any packages which links directly to BLAS/LAPACK should be pinning their BLAS implementation and versions. I suspect there are packages in the channel which do not do this correctly, these should be updated and the un-pinned packages removed. This BLAS pinning limits the the supported versions of numpy available to the package via the solver. I do not think this limitation should be introduced by overloading the numpy package to include a BLAS version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building against a "newish" numpy does not create a package which works with older numpy versions. It is necessary to specify the run requirement numpy be strictly greater than or equal to the build version.

I understand. However this is not I was proposing either.

Am proposing we pick 1.10 or 1.11 and build with that version and make it the lower bound as well across all Pythons. It will be the easiest to do for core, for reviewers, for maintainers and the most consistent for a variety of reasons already outlined. I don't really see any point to going to any more work than that as I would expect nearly all conda-forge users are on NumPy 1.10 or greater at this point.

Although conda-forge does not provide numpy version less than 1.10 they are available in defaults and I believe should be supported. I am willing to submit PRs to get conda-forge version of these older numpy version if that is desired.

That said, if you are willing to do the work to make this happen, no real opposition from me. Just seems like the average maintainer already has enough to keep in our heads (e.g. building on Windows using python as a proxy for VS, pinnings, etc.), it would be nice to not give them something new NumPy strategy that is more work than what they already do. Though again this is a concern IMHO not a blocker.

This BLAS pinning limits the the supported versions of numpy available to the package via the solver. I do not think this limitation should be introduced by overloading the numpy package to include a BLAS version.

I think the point is missed here. If someone pins BLAS as we tell them to do and pins NumPy as we tell them to do and these conflict, the package won't build. I'm almost certain we will hear grumbles about this. Hence not having old enough NumPy's built with the currently BLAS schema (and pinning) does present a blocker for the strategy outlined in this PR. How the knobs of NumPy version and BLAS version(s) are tuned will make this easier or harder to maintain in the long run. Adding Python in with these knobs complicates things. So I would urge that we all reach some agreement on where everyone feels comfortable tuning such knobs and how variable they can be.

At this point, we probably should roll this over into a webpage issue so that this doesn't get buried as it will be some sort of policy decision.

@ocefpaf
Copy link
Member

ocefpaf commented Jul 17, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants