-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
setup.py now cleanly handles setup-time cython dependency (closes #768) #799
Conversation
Customized setup-time dependency installation to work around setuptools' problematic behavior (closes #798) Modified lazy dependency scheme to be less hacky (we now subclass distutils.Distribution). Removed use of Cython's build_ext (redundant with cythonize): https://groups.google.com/forum/#!topic/cython-users/fBWLUSJWod0
Please give me a few days to check out the conda builds. |
@kain88-de, I just assigned you so that you have time to test with conda. Hand-off final review if necessary. |
super(MDADistribution, self).__init__(attrs) | ||
|
||
def run_commands(self): | ||
# The following seems fragile... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it fragile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rationale here is to commit to installing dependencies according to which class of setup command is invoked.
./setup.py install
is treated as aninstall
and dependencies are installed;./setup.py build
is treated as abuild
and an error is thrown if dependencies are needed (not installed automatically because there's no safe way to install them system-wide);pip
first internally runs./setup.py egg_info
just to get dependency information. Normally we wouldn't want this to trigger dependency installs becausepip
would be on top of it, butbiopython
needs it to be done. Actual cythonization is done later, whenpip
internally invokes./setup.py install
.
As you can see, the decisions are based on whether specific strings are present in the command (check ./setup.py --help-commands
). It'll fail in the future when someone comes up with a whatevs_instl
that should behave like an install
, or if pip
starts using a different command than egg_info
to list dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we care abut biopython? It isn't necessary for the build of the cython modules. I understand that we do special things to get the build-started. But for normal python dependencies we shouldn't do such things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we push this to build-time (which would be the cleanest, IMO), then even pip install MDAnalysis
will fail if you don't have biopython
and numpy
. What happens is the following:
pip
probes forMDAnalysis
dependencies usingegg_info
; thesetup.py
creates theegg-info
dir.pip
learns it must installnumpy
,biopython
and others;- Before installing anything
pip
must learn if the dependencies themselves have dependencies. Thus,./setup.py egg_info
will be run for every dependency, includingbiopython
; - Not being as awesome as ours, the
setup.py
ofbiopython
proceeds to bork, because it needsnumpy
even to list its dependencies.
We solve this by doing the numpy
install ourselves as soon as we suspect pip
is in use.
This, of course, is biopython
's fault, but if we can make life easier for our users, better.
It is still not foolproof, though: if there's a package mdthingy
that depends on mdanalysis
and biopython
our hack will only save the install if pip
probes us before biopython
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this because we are reinventing the wheel and circumventing existing solutions! I don't really like to add such a hack that is easy to break and not easy to understand for other people.
What is done first install our install_requires
or setup_requires
? Also is MDAnalysis first build and then everything is installed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think Max is right here from what I can tell. Can we not fix this
upstream?
On Fri, 25 Mar 2016 11:04 kain88-de, notifications@github.com wrote:
In package/setup.py
#799 (comment):
Setup-time cython dependency comes when cythonizing sources (cython might
not be available) or configuring numpy's include dir (numpy might
not be available either).
The actual cython/numpy imports and calls can be delayed until after
pip/setuptools have figured they must install them.
This is accomplished by using a distutils.core.Distribution subclass
that lazily fills-in cython and numpy information by hooking into
the run_commands method.
- def init(self, attrs=None):
# Attributes must exist if they're to be imported from attrs.
self.mda_use_cython = is_dev
self.mda_build_requires = None
super(MDADistribution, self).**init**(attrs)
- def run_commands(self):
# The following seems fragile...
I don't like this because we are reinventing the wheel and circumventing
existing solutions! I don't really like to add such a hack that is easy to
break and not easy to understand for other people.What is done first install our install_requires or setup_requires? Also
is MDAnalysis first build and then everything is installed?—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
https://github.com/MDAnalysis/mdanalysis/pull/799/files/f3f5a69644ab0ef51ddda8f0ba74d10505013199#r57436127
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@richardjgowers: I approached the biopython
people back then, proposing my workaround. They have extremely exquisite deployment targets and it'd have been too much work for me to fix it there. I refactored my workaround a bit now and might try looking at it again, but, honestly, don't hold your breath.
@kain88-de: I disagree that we're doing anything bad here. If you look at the setuptools
code you'll quickly realize everything feels like a stack of kludges and hacks on top of distutils
. By subclassing and putting those in our setup.py
at least the hacks are obvious. If, however, you feel that the code lacks commenting, let me know.
Besides, relying on setup_requires
is bad (there's no control whatsoever over its behavior), and that's what we already do before this PR! If you don't have numpy
and run ./setup.py --help
it will install numpy
in your source dir(!).
To address your question:
(assuming that we invoke ./setup.py install
and that numpy
is absent)
- Before this PR:
setup()
is invoked with all our options;- A
setuptools.Distribution
is instanced with those options; - The distribution's
__init__
checks forsetup_requires
, and any needed packages installed in the current dir (under.eggs
); MDAnalysis
gets built (I'd imaginesetuptools
would install dependencies underinstall_requires
before buildingMDAnalysis
, but apparently that's not the case);- Everything under
install_requires
that isn't available in the system gets system-wide installed. Note that having anumpy
egg in the current dir inhibits any further system-wide installation of it (seesetuptools
' issues 209 and 391).
- After this PR:
setup()
is invoked with all our options;- Our own subclass of
setuptools.Distribution
is instanced with those options; - The distribution's
__init__
checks forsetup_requires
, but none is present because we now have a better way (I'd really advocate against ever using it again); - Depending on which command is run, we go through
mda_build_requires
and install the required dependencies. This is only done if we detect that we're doing an install ofMDAnalysis
. Asking for--help
or similar won't trigger dependency installation. - Our subclassing replaces the dumb local install procedure with a system-wide install (we actually copy the flags passed to
setup.py
, so--user
installs should also work as expected). MDAnalysis
gets built according to the regularsetuptools
flow.
When going through pip
, it first probes MDAnalysis
for dependencies, and, recursively, its dependencies. It internally invokes ./setup.py egg_info
, and later ./setup.py install
. MDAnalysis
only gets built at the install phase, but numpy
will be installed upon initial probing. Again, this is to keep biopython
happy; we're already pretty much independent of setup-time dependency solving.
So I can still build conda packages. Good! But one other question. Why do we need this again? I don't think it is to hard to ask from a dev to install cython. |
The point of a package is that it installs all its requirements. It is not a big deal to ask a dev to install cython, it is not a big deal to ask him to install numpy either, nor that it is to ask him to install six... But one other question. Why do we need this again? I don't think it is to hard to ask from a dev to install cython. —You are receiving this because you were mentioned.Reply to this email directly or view it on GitHub |
Thanks for the testing @kain88-de! |
I agree with Jonathan and Manel: all of this should "just work". Python package installation is a pretty atrocious mess and its fugly but I really don't care what it looks like under the hood in setup.py as long as it works. There's a place for zen-like simplicity in code but setup.py in the current python packaging environment isn't one of them as far as I can tell... or at least someone has to show me alternatives for me to believe it. (If installing python packages was easy then companies like Enthought and Continuum couldn't make a business out of installing Python packages...) On 25 Mar, 2016, at 00:37, Jonathan Barnoud wrote:
Oliver Beckstein * orbeckst@gmx.net |
For the published package it does just work. I'm saying that we don't need this for an installation in developer mode. I'm OK with the just work attitude for the I would say that we can issue error messages when a package is missing for a build in developer mode. I'm just worried because |
I first assumed that this was only a cleaner way of the previous solution. I got confused by the remarks that stuff is fragile and that we suddenly need to take care of biopython in a special way. |
@kain88-de, those are legitimate concerns. I'd remark that the release version doesn't really "just work" that well as per the stupid setuptools behavior of installing As for maintainability, I can provide a lot more detail in comments in |
I'd appreciate that. |
@mnmelo yeah maybe if you just commented the hell out of it, I just fear for the day that it breaks and you're not around to save us again :D |
Comments and a permanent (wiki) copy of your excellent step-by-step description of what happens and what the pitfalls are. The discussion here already goes a long way towards making it more maintainable but we should have a more permanent record than code review comments (which are not even searchable or might perhaps disappear). Oliver Beckstein Am Mar 25, 2016 um 9:41 schrieb Richard Gowers notifications@github.com:
|
Ok, let me know if this is better (I also did some minor cleanup). In any case, don't merge right away. I want to test this with |
# setup-time dependencies locally. This involves essentially a rewrite | ||
# of fetch_build_egg. | ||
# Hopefully, since none of the overriden methods was pre-underscored, | ||
# we'll stay clear of problems while this setuptools API lasts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way that we can include tests that probe the setuptools API that we rely on?
Or is this a moot point because we wouldn't even get to run the tests without full setup.py functionality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not moot, because we might arrive at this point even if setuptools
changes its Distribution
API.
I'm adding a couple hasattrs
to give us an early warning if setuptools changes the API of Distribution. I make the script stop with an error if it detects differences.
@mnmelo , at this stage, the more background information and rationale for the code that you can transfer from your head into comments the better! So I would also put the steps into the setup.py file. If you could a short paragraph somewhere to the developer guide on the wiki just pointing out that I think we should just use the comments in setup.py for detailed information because maintaining setup.py and a detailed wiki page is too burdensome. |
Yes Comments in the setup.py should be fine. You have recently noticed the burden of keeping code and the wiki in sync when you tried to run some test I think. |
# that lazily fills-in cython and numpy information by hooking into | ||
# the run_commands method. | ||
# Our overrides also address the setuptools behavior of installing | ||
# setup-time dependencies locally. This involves essentially a rewrite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this mean? I'm curious for my own understanding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you be more specific? I explain three concepts in this comment. Which should be clarified?
In any case, I already updated it a bit and it looks like this. Do comment if it's still not clear.
# Setup-time dependences come when cythonizing sources (cython might
# not be available) or configuring numpy's include dir (numpy might
# not be available either).
# The actual cython/numpy imports and calls can be delayed until after
# pip/setuptools have figured they must install them.
# This is accomplished by using a setuptools.dist.Distribution subclass
# that lazily fills-in cython and numpy information by hooking into
# the run_commands method. run_commands is the last Distribution method
# before the actual installation takes place (the last method before
# install that we can hook into by subclassing Distribution).
# Our overrides also address the setuptools behavior of installing
# setup-time dependencies locally. This involves essentially a rewrite
# of fetch_build_egg, which is the method that takes care of installing
# dependencies on-the-fly.
# Hopefully, since none of the overriden methods was pre-underscored,
# we'll stay clear of problems while this setuptools API lasts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I understand it better now. after reading through the whole PR.
The thing is I didn't understand why this changes how setup-time dependencies are installed. (I still don't understand why this is done). I assume that revers now to cython and numpy that this way won't be installed for the users and it is only used to compile our cython/c-modules. Wouldn't that mean cython is always installed for each build if it isn't available in the active environment?
'numpy>=1.5.0', | ||
], | ||
] + require_cython[1:], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the reason that we can't use the same string to specify the cython version we are using?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is indirect like this because when doing release installs we don't want to burden the user with cython (we're distributing the compiled cython stuff anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What. How does this answer my questions why we have two strings specifying the cython version needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, misunderstood your doubt. As you can see, requires
and install_requires
use different version-requirement formats. We have to cater to that. I made our own mda_build_requires
follow the install_requires
format.
(And admittedly, put zero effort in checking whether we can use a common format for both, since things were already like this before I touched the code.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK so just another weird thing about setuptools. Then you can leave it as it is. Maybe add a comment where you define require_cython
for later people to understand this. Or maybe better use a dictionary with key as requires
and install_requires
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea.
@mnmelo if you fix the last comments and the merge conflicts I think this is ready to merge. |
@mnmelo do you still want to merge this? |
Honestly, I'm not sure. There's just too many pitfalls with trying to fix the silly setuptools/pip handling of numpy-like dependencies. I may give it a try in a while, but it's not on my priority list right now. |
Is it even worth then to keep all this special code around to handle all possible pitfalls. Wouldn't it be easier to just issue a warning that cython and numpy should be installed and exit the build? Numpy and cython are pretty standard to install today anyway IMHO. (it should also be that case on osx) The main reason I remember @orbeckst like the autohandling is to hand hold users. But with the conda packages we have now we can just tell them to use them. Then numpy and other build dependencies aren't a problem either. And we save us to handle a lot of code specialiced to setuptools internals that might change with any release. This would be really bad if I want to install this version of MDAnalysis in a few years when setuptools internals have changed. |
I would have be super happy if I could create ready to use virtualenv using a requirement file without caring about how MDAnalysis is special. But it is probably not worth the hassle. |
I guess you're right, @kain88-de. Are there any corner cases where conda wouldn't be available and a user must use pip? And if there are is it very bad to tell them to install numpy beforehand? |
I just want to add that in my opinion conda packages are really second class compared to pip installed packages because the conda packages do not contain the openmp code. Oliver Beckstein
|
That is true for most pip installations as well as I don't think a lot of people have openmp development headers installed. But I'm 90% sure we can create conda packages with openmp. |
maybe open-bsd but otherwise no. On some environments like a cluster conda might even be easier to use then pip. |
That would be great. Oliver Beckstein * orbeckst@gmx.net |
Overall I agree with @kain88-de 's assessment #799 (comment) : For the average user and workshops we can use conda packages (which we didn't have before), and someone who cares about OpenMP is probably also able to handle installing numpy on their own (cython is only needed for development, right?). I am not sure I understand @jbarnoud 's #799 (comment)
because I assume you could put numpy in the requirements file, couldn't you? It boils down to the question if we are loosing anything major (while gaining a lot of maintainability, without wanting to dismiss @mnmelo 's great feats in reverse-engineering setuptools' thinking). If there's no other voice coming up with essential reasons why we should be auto-installing numpy and cython (maybe @hainm has an opinion – I think he raised some of the original issues a while back) then we could go back to a plain exception if either is missing. Alternatively, we keep the status quo: We keep the numpy pre-installing code (which is working, isn't it?) and just say "if you're a developer you should be able to install cython" – as long as there's a decent error message it would be fine in my eyes. |
+1 for "decent error message" |
You can have numpy in the requirements, but MDAnalysis imports numpy in setup.py when it is time to declare its dependencies. As a result, pip fails because numpy is missing before it get a chance to install it. This mean that you need to first install numpy (and maybe cython) before you can install the requirements for your project. Nothing terrible, I just do not like to need a workaround. |
I think requirement files can be used to create virtual environments. There are fiies in conda I can use to create a conda-environment quickly. We could add this in the wiki with instructions to help people setup development environments quickly. With the overall discussion I will also close this PR. |
These changes to
setup.py
are hard to test comprehensively../setup.py install
andpip install .
both worked for me, for python2.7 and python3.4. I urge everyone to report asap any issues they might encounter with install.When solving dependency installation I made the decision (aided by @jbarnoud) that local builds (
./setup.py build
) will not install setup-timenumpy
/cython
dependencies, simply because we don't know where they should be put. If a user is doing that kind of build they'll have no problem installing these dependencies on their own (a nice error message is given in these cases).Customized setup-time dependency installation to work around setuptools'
problematic behavior (closes #798)
Modified lazy dependency scheme to be less hacky (we now subclass
distutils.Distribution).
Removed use of Cython's build_ext (redundant with cythonize):
https://groups.google.com/forum/#!topic/cython-users/fBWLUSJWod0