Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setup.py now cleanly handles setup-time cython dependency (closes #768) #799

Closed
wants to merge 3 commits into from

Conversation

mnmelo
Copy link
Member

@mnmelo mnmelo commented Mar 23, 2016

These changes to setup.py are hard to test comprehensively. ./setup.py install and pip install . both worked for me, for python2.7 and python3.4. I urge everyone to report asap any issues they might encounter with install.

When solving dependency installation I made the decision (aided by @jbarnoud) that local builds (./setup.py build) will not install setup-time numpy/cython dependencies, simply because we don't know where they should be put. If a user is doing that kind of build they'll have no problem installing these dependencies on their own (a nice error message is given in these cases).


Customized setup-time dependency installation to work around setuptools'
problematic behavior (closes #798)

Modified lazy dependency scheme to be less hacky (we now subclass
distutils.Distribution).

Removed use of Cython's build_ext (redundant with cythonize):
https://groups.google.com/forum/#!topic/cython-users/fBWLUSJWod0

Customized setup-time dependency installation to work around setuptools'
problematic behavior (closes #798)

Modified lazy dependency scheme to be less hacky (we now subclass
distutils.Distribution).

Removed use of Cython's build_ext (redundant with cythonize):
https://groups.google.com/forum/#!topic/cython-users/fBWLUSJWod0
@kain88-de
Copy link
Member

Please give me a few days to check out the conda builds.

@orbeckst
Copy link
Member

@kain88-de, I just assigned you so that you have time to test with conda. Hand-off final review if necessary.

super(MDADistribution, self).__init__(attrs)

def run_commands(self):
# The following seems fragile...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it fragile?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rationale here is to commit to installing dependencies according to which class of setup command is invoked.

  • ./setup.py install is treated as an install and dependencies are installed;
  • ./setup.py build is treated as a build and an error is thrown if dependencies are needed (not installed automatically because there's no safe way to install them system-wide);
  • pip first internally runs ./setup.py egg_info just to get dependency information. Normally we wouldn't want this to trigger dependency installs because pip would be on top of it, but biopython needs it to be done. Actual cythonization is done later, when pip internally invokes ./setup.py install.

As you can see, the decisions are based on whether specific strings are present in the command (check ./setup.py --help-commands). It'll fail in the future when someone comes up with a whatevs_instl that should behave like an install, or if pip starts using a different command than egg_info to list dependencies.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we care abut biopython? It isn't necessary for the build of the cython modules. I understand that we do special things to get the build-started. But for normal python dependencies we shouldn't do such things.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we push this to build-time (which would be the cleanest, IMO), then even pip install MDAnalysis will fail if you don't have biopython and numpy. What happens is the following:

  1. pip probes for MDAnalysis dependencies using egg_info; the setup.py creates the egg-info dir. pip learns it must install numpy, biopython and others;
  2. Before installing anything pip must learn if the dependencies themselves have dependencies. Thus, ./setup.py egg_info will be run for every dependency, including biopython;
  3. Not being as awesome as ours, the setup.py of biopython proceeds to bork, because it needs numpy even to list its dependencies.

We solve this by doing the numpy install ourselves as soon as we suspect pip is in use.
This, of course, is biopython's fault, but if we can make life easier for our users, better.

It is still not foolproof, though: if there's a package mdthingy that depends on mdanalysis and biopython our hack will only save the install if pip probes us before biopython.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this because we are reinventing the wheel and circumventing existing solutions! I don't really like to add such a hack that is easy to break and not easy to understand for other people.

What is done first install our install_requires or setup_requires? Also is MDAnalysis first build and then everything is installed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think Max is right here from what I can tell. Can we not fix this
upstream?

On Fri, 25 Mar 2016 11:04 kain88-de, notifications@github.com wrote:

In package/setup.py
#799 (comment):

  • Setup-time cython dependency comes when cythonizing sources (cython might

  • not be available) or configuring numpy's include dir (numpy might

  • not be available either).

  • The actual cython/numpy imports and calls can be delayed until after

  • pip/setuptools have figured they must install them.

  • This is accomplished by using a distutils.core.Distribution subclass

  • that lazily fills-in cython and numpy information by hooking into

  • the run_commands method.

  • def init(self, attrs=None):
  •    # Attributes must exist if they're to be imported from attrs.
    
  •    self.mda_use_cython = is_dev
    
  •    self.mda_build_requires = None
    
  •    super(MDADistribution, self).**init**(attrs)
    
  • def run_commands(self):
  •    # The following seems fragile...
    

I don't like this because we are reinventing the wheel and circumventing
existing solutions! I don't really like to add such a hack that is easy to
break and not easy to understand for other people.

What is done first install our install_requires or setup_requires? Also
is MDAnalysis first build and then everything is installed?


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
https://github.com/MDAnalysis/mdanalysis/pull/799/files/f3f5a69644ab0ef51ddda8f0ba74d10505013199#r57436127

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richardjgowers: I approached the biopython people back then, proposing my workaround. They have extremely exquisite deployment targets and it'd have been too much work for me to fix it there. I refactored my workaround a bit now and might try looking at it again, but, honestly, don't hold your breath.

@kain88-de: I disagree that we're doing anything bad here. If you look at the setuptools code you'll quickly realize everything feels like a stack of kludges and hacks on top of distutils. By subclassing and putting those in our setup.py at least the hacks are obvious. If, however, you feel that the code lacks commenting, let me know.

Besides, relying on setup_requires is bad (there's no control whatsoever over its behavior), and that's what we already do before this PR! If you don't have numpy and run ./setup.py --help it will install numpy in your source dir(!).

To address your question:
(assuming that we invoke ./setup.py install and that numpy is absent)

  • Before this PR:
    1. setup() is invoked with all our options;
    2. A setuptools.Distribution is instanced with those options;
    3. The distribution's __init__ checks for setup_requires, and any needed packages installed in the current dir (under .eggs);
    4. MDAnalysis gets built (I'd imagine setuptools would install dependencies under install_requires before building MDAnalysis, but apparently that's not the case);
    5. Everything under install_requires that isn't available in the system gets system-wide installed. Note that having a numpy egg in the current dir inhibits any further system-wide installation of it (see setuptools' issues 209 and 391).
  • After this PR:
    1. setup() is invoked with all our options;
    2. Our own subclass of setuptools.Distribution is instanced with those options;
    3. The distribution's __init__ checks for setup_requires, but none is present because we now have a better way (I'd really advocate against ever using it again);
    4. Depending on which command is run, we go through mda_build_requires and install the required dependencies. This is only done if we detect that we're doing an install of MDAnalysis. Asking for --help or similar won't trigger dependency installation.
    5. Our subclassing replaces the dumb local install procedure with a system-wide install (we actually copy the flags passed to setup.py, so --user installs should also work as expected).
    6. MDAnalysis gets built according to the regular setuptools flow.

When going through pip, it first probes MDAnalysis for dependencies, and, recursively, its dependencies. It internally invokes ./setup.py egg_info, and later ./setup.py install. MDAnalysis only gets built at the install phase, but numpy will be installed upon initial probing. Again, this is to keep biopython happy; we're already pretty much independent of setup-time dependency solving.

@kain88-de
Copy link
Member

So I can still build conda packages. Good!

But one other question. Why do we need this again? I don't think it is to hard to ask from a dev to install cython.

@jbarnoud
Copy link
Contributor

The point of a package is that it installs all its requirements. It is not a big deal to ask a dev to install cython, it is not a big deal to ask him to install numpy either, nor that it is to ask him to install six...
Le 25 mars 2016 7:59 AM, kain88-de notifications@github.com a écrit :So I can still build conda packages. Good!

But one other question. Why do we need this again? I don't think it is to hard to ask from a dev to install cython.

—You are receiving this because you were mentioned.Reply to this email directly or view it on GitHub

@mnmelo
Copy link
Member Author

mnmelo commented Mar 25, 2016

Thanks for the testing @kain88-de!

@orbeckst
Copy link
Member

I agree with Jonathan and Manel: all of this should "just work". Python package installation is a pretty atrocious mess and its fugly but I really don't care what it looks like under the hood in setup.py as long as it works. There's a place for zen-like simplicity in code but setup.py in the current python packaging environment isn't one of them as far as I can tell... or at least someone has to show me alternatives for me to believe it.

(If installing python packages was easy then companies like Enthought and Continuum couldn't make a business out of installing Python packages...)

On 25 Mar, 2016, at 00:37, Jonathan Barnoud wrote:

The point of a package is that it installs all its requirements. It is not a big deal to ask a dev to install cython, it is not a big deal to ask him to install numpy either, nor that it is to ask him to install six...

Oliver Beckstein * orbeckst@gmx.net
skype: orbeckst * orbeckst@gmail.com

@kain88-de
Copy link
Member

all of this should "just work"

For the published package it does just work. I'm saying that we don't need this for an installation in developer mode. I'm OK with the just work attitude for the master branch and published packages.

I would say that we can issue error messages when a package is missing for a build in developer mode.
But needing additional libraries to build something as a developer isn't anything special.

I'm just worried because pip is complicated and badly documented the same is true for setuptools.
If a future version of either of them suddenly breaks one of our assumptions we suddenly have to support our hack for two versions of pip, setuptools. I'm worried this will grow out of hand and nobody knows how to fix it in the end.

@kain88-de
Copy link
Member

I first assumed that this was only a cleaner way of the previous solution. I got confused by the remarks that stuff is fragile and that we suddenly need to take care of biopython in a special way.

@mnmelo
Copy link
Member Author

mnmelo commented Mar 25, 2016

@kain88-de, those are legitimate concerns. I'd remark that the release version doesn't really "just work" that well as per the stupid setuptools behavior of installing setup_requires dependencies locally.

As for maintainability, I can provide a lot more detail in comments in setup.py. I wouldn't want to leave a solution out just because of complexity (and I'd argue that setuptools development is too sluggish for us to worry about API breakage; I'd rather be proactive and fix what they don't, if only for ourselves).

@mnmelo
Copy link
Member Author

mnmelo commented Mar 25, 2016

Sorry, had missed your previous comment. Indeed, the cleaning of the code is a bit secondary to solving #768 and #798.

@kain88-de
Copy link
Member

As for maintainability, I can provide a lot more detail in comments

I'd appreciate that.

@richardjgowers
Copy link
Member

@mnmelo yeah maybe if you just commented the hell out of it, I just fear for the day that it breaks and you're not around to save us again :D

@orbeckst
Copy link
Member

Comments and a permanent (wiki) copy of your excellent step-by-step description of what happens and what the pitfalls are. The discussion here already goes a long way towards making it more maintainable but we should have a more permanent record than code review comments (which are not even searchable or might perhaps disappear).

Oliver Beckstein
email: orbeckst@gmail.com

Am Mar 25, 2016 um 9:41 schrieb Richard Gowers notifications@github.com:

@mnmelo yeah maybe if you just commented the hell out of it, I just fear for the day that it breaks and you're not around to save us again :D

@mnmelo
Copy link
Member Author

mnmelo commented Mar 25, 2016

Ok, let me know if this is better (I also did some minor cleanup).
Maybe I should also add the step-by-step to setup.py?

In any case, don't merge right away. I want to test this with --user installs and need to set up a vagrant box (anyone knows if it can be done with virtualenv?)

# setup-time dependencies locally. This involves essentially a rewrite
# of fetch_build_egg.
# Hopefully, since none of the overriden methods was pre-underscored,
# we'll stay clear of problems while this setuptools API lasts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way that we can include tests that probe the setuptools API that we rely on?

Or is this a moot point because we wouldn't even get to run the tests without full setup.py functionality?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not moot, because we might arrive at this point even if setuptools changes its Distribution API.

I'm adding a couple hasattrs to give us an early warning if setuptools changes the API of Distribution. I make the script stop with an error if it detects differences.

@orbeckst
Copy link
Member

@mnmelo , at this stage, the more background information and rationale for the code that you can transfer from your head into comments the better! So I would also put the steps into the setup.py file.

If you could a short paragraph somewhere to the developer guide on the wiki just pointing out that setup.py is complicated for the following reasons 1., 2., 3, ... and a 3-mile overview of what we (=mostly you) have done to make it work? Don't reproduce the detailed comments (which might change) but just give the gist of it.

I think we should just use the comments in setup.py for detailed information because maintaining setup.py and a detailed wiki page is too burdensome.

@kain88-de
Copy link
Member

I think we should just use the comments in setup.py for detailed information because maintaining setup.py and a detailed wiki page is too burdensome.

Yes Comments in the setup.py should be fine. You have recently noticed the burden of keeping code and the wiki in sync when you tried to run some test I think.

# that lazily fills-in cython and numpy information by hooking into
# the run_commands method.
# Our overrides also address the setuptools behavior of installing
# setup-time dependencies locally. This involves essentially a rewrite
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this mean? I'm curious for my own understanding

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be more specific? I explain three concepts in this comment. Which should be clarified?

In any case, I already updated it a bit and it looks like this. Do comment if it's still not clear.

    # Setup-time dependences come when cythonizing sources (cython might
    #  not be available) or configuring numpy's include dir (numpy might
    #  not be available either).
    # The actual cython/numpy imports and calls can be delayed until after
    #  pip/setuptools have figured they must install them.
    # This is accomplished by using a setuptools.dist.Distribution subclass
    #  that lazily fills-in cython and numpy information by hooking into
    #  the run_commands method. run_commands is the last Distribution method
    #  before the actual installation takes place (the last method before  
    #  install that we can hook into by subclassing Distribution).
    # Our overrides also address the setuptools behavior of installing
    #  setup-time dependencies locally. This involves essentially a rewrite
    #  of fetch_build_egg, which is the method that takes care of installing
    #  dependencies on-the-fly.
    # Hopefully, since none of the overriden methods was pre-underscored,
    #  we'll stay clear of problems while this setuptools API lasts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I understand it better now. after reading through the whole PR.

The thing is I didn't understand why this changes how setup-time dependencies are installed. (I still don't understand why this is done). I assume that revers now to cython and numpy that this way won't be installed for the users and it is only used to compile our cython/c-modules. Wouldn't that mean cython is always installed for each build if it isn't available in the active environment?

'numpy>=1.5.0',
],
] + require_cython[1:],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the reason that we can't use the same string to specify the cython version we are using?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indirect like this because when doing release installs we don't want to burden the user with cython (we're distributing the compiled cython stuff anyway).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What. How does this answer my questions why we have two strings specifying the cython version needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, misunderstood your doubt. As you can see, requires and install_requires use different version-requirement formats. We have to cater to that. I made our own mda_build_requires follow the install_requires format.
(And admittedly, put zero effort in checking whether we can use a common format for both, since things were already like this before I touched the code.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so just another weird thing about setuptools. Then you can leave it as it is. Maybe add a comment where you define require_cython for later people to understand this. Or maybe better use a dictionary with key as requires and install_requires.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

@kain88-de
Copy link
Member

@mnmelo if you fix the last comments and the merge conflicts I think this is ready to merge.

@kain88-de
Copy link
Member

@mnmelo do you still want to merge this?

@mnmelo
Copy link
Member Author

mnmelo commented Aug 23, 2016

Honestly, I'm not sure. There's just too many pitfalls with trying to fix the silly setuptools/pip handling of numpy-like dependencies.

I may give it a try in a while, but it's not on my priority list right now.

@kain88-de
Copy link
Member

Is it even worth then to keep all this special code around to handle all possible pitfalls. Wouldn't it be easier to just issue a warning that cython and numpy should be installed and exit the build? Numpy and cython are pretty standard to install today anyway IMHO. (it should also be that case on osx)

The main reason I remember @orbeckst like the autohandling is to hand hold users. But with the conda packages we have now we can just tell them to use them. Then numpy and other build dependencies aren't a problem either. And we save us to handle a lot of code specialiced to setuptools internals that might change with any release. This would be really bad if I want to install this version of MDAnalysis in a few years when setuptools internals have changed.

@jbarnoud
Copy link
Contributor

I would have be super happy if I could create ready to use virtualenv using a requirement file without caring about how MDAnalysis is special. But it is probably not worth the hassle.

@mnmelo
Copy link
Member Author

mnmelo commented Aug 24, 2016

I guess you're right, @kain88-de. Are there any corner cases where conda wouldn't be available and a user must use pip? And if there are is it very bad to tell them to install numpy beforehand?

@orbeckst
Copy link
Member

I just want to add that in my opinion conda packages are really second class compared to pip installed packages because the conda packages do not contain the openmp code.

Oliver Beckstein
email: orbeckst@gmail.com

Am Aug 24, 2016 um 4:14 schrieb Max Linke notifications@github.com:

@orbeckst like the autohandling is to hand hold users. But with the conda packages we have now we can just tell them to use them. Then numpy and other build dependencies aren't a problem either. And we save us to handle a lot of code specialiced to setuptools internals that might change with any release. This would be really bad if I want to install this version of MDAnalysis in a few years when setuptools internals

@kain88-de
Copy link
Member

I just want to add that in my opinion conda packages are really second class compared to pip installed packages because the conda packages do not contain the openmp code.

That is true for most pip installations as well as I don't think a lot of people have openmp development headers installed. But I'm 90% sure we can create conda packages with openmp.

@kain88-de
Copy link
Member

Are there any corner cases where conda wouldn't be available and a user must use pip?

maybe open-bsd but otherwise no. On some environments like a cluster conda might even be easier to use then pip.

@orbeckst
Copy link
Member

On 24 Aug, 2016, at 10:15, Max Linke notifications@github.com wrote:

That is true for most pip installations as well as I don't think a lot of people have openmp development headers installed. But I'm 90% sure we can create conda packages with openmp.

That would be great.

Oliver Beckstein * orbeckst@gmx.net
skype: orbeckst * orbeckst@gmail.com

@orbeckst
Copy link
Member

Overall I agree with @kain88-de 's assessment #799 (comment) : For the average user and workshops we can use conda packages (which we didn't have before), and someone who cares about OpenMP is probably also able to handle installing numpy on their own (cython is only needed for development, right?).

I am not sure I understand @jbarnoud 's #799 (comment)

I would have be super happy if I could create ready to use virtualenv using a requirement file without caring about how MDAnalysis is special.

because I assume you could put numpy in the requirements file, couldn't you?

It boils down to the question if we are loosing anything major (while gaining a lot of maintainability, without wanting to dismiss @mnmelo 's great feats in reverse-engineering setuptools' thinking).

If there's no other voice coming up with essential reasons why we should be auto-installing numpy and cython (maybe @hainm has an opinion – I think he raised some of the original issues a while back) then we could go back to a plain exception if either is missing.

Alternatively, we keep the status quo: We keep the numpy pre-installing code (which is working, isn't it?) and just say "if you're a developer you should be able to install cython" – as long as there's a decent error message it would be fine in my eyes.

@hainm
Copy link
Contributor

hainm commented Aug 24, 2016

"if you're a developer you should be able to install cython" – as long as there's a decent error message it would be fine in my eyes.

+1 for "decent error message"

@jbarnoud
Copy link
Contributor

@orbeckst

because I assume you could put numpy in the requirements file, couldn't you?

You can have numpy in the requirements, but MDAnalysis imports numpy in setup.py when it is time to declare its dependencies. As a result, pip fails because numpy is missing before it get a chance to install it. This mean that you need to first install numpy (and maybe cython) before you can install the requirements for your project. Nothing terrible, I just do not like to need a workaround.

@kain88-de
Copy link
Member

I think requirement files can be used to create virtual environments. There are fiies in conda I can use to create a conda-environment quickly. We could add this in the wiki with instructions to help people setup development environments quickly.

With the overall discussion I will also close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

setuptools setup_requires installs numpy/cython in unwanted locations
6 participants