Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do we need to track cython .c files? #667

Closed
mnmelo opened this issue Jan 27, 2016 · 37 comments
Closed

Do we need to track cython .c files? #667

mnmelo opened this issue Jan 27, 2016 · 37 comments

Comments

@mnmelo
Copy link
Member

mnmelo commented Jan 27, 2016

Whenever I build MDAnalysis cython regenerates some *.c files in /lib/ and /lib/formats/. This messes up my commits as a number of files I didn't specifically change appear as modified.

The question is: do we even need these files around? Can we just leave them out (and untracked) and let cython recreate them every time a build is issued?

@hainm
Copy link
Contributor

hainm commented Jan 27, 2016

Given that it is super easy to install cython, I dont think cythonized files are needed.

(And given that mdanalysis requires other packages too)

@jbarnoud
Copy link
Contributor

For the record, the issue was discussed in issue #66 and cython files got included to remove the dependency to cython.

In my opinion, you should now what you are doing if you install MDAnalysis from the develop branch. I would remove the C files from the develop branch but keep them in the pip package, and perhaps in the master branch.

@mnmelo
Copy link
Member Author

mnmelo commented Jan 27, 2016

I knew I should have dug deeper in the old issues... Anyway, my question then is, if they're distributed so that cython isn't needed what triggers their regeneration by cython? Shouldn't setup.py just rely on them being there?

Or is it then a facultative dependency thing, where if we have cython we can use it to get some improvement otherwise we fallback to the distributed .c files? (I saw that some of the changes to the .c files seem to be architecture-dependent optimizations.)

@richardjgowers
Copy link
Member

You can definitely force the rebuilding just by touching the files, but yeah sometimes it seems that they're rebuilt because of versions of Cython being different too. FWIW, I'd rather keep the c files just to not have Cython as a hard dependency.

@mnmelo
Copy link
Member Author

mnmelo commented Jan 27, 2016

Yup, I agree on the dependency part, but wouldn't it be better then to leave them untracked, and then rebuild them for each release? (Like the docs regeneration?)

@dotsdl
Copy link
Member

dotsdl commented Jan 28, 2016

I agree with @hainm ; these days Cython isn't a scary dependency, and since most users are likely installing with Anaconda anyway, they already have it. I say we cut the amount of generated stuff we're committing to the repo's main branches and leave out the cython-generated code.

@mnmelo
Copy link
Member Author

mnmelo commented Jan 28, 2016

I'm not sure @orbeckst will agree... I still vote for not tracking but rebuilding and releasing, like the docs.

@dotsdl
Copy link
Member

dotsdl commented Jan 28, 2016

@mnmelo I like that solution. I think rebuilding is something the package maintainer / release manager has to do anyway.

@mnmelo
Copy link
Member Author

mnmelo commented Jan 28, 2016

@richardjgowers, would you be ok with adding the cython rebuild to the release workflow?

@orbeckst
Copy link
Member

On 28 Jan, 2016, at 11:26, David Dotson wrote:

@mnmelo I like that solution. I think rebuilding is something the package maintainer / release manager has to do anyway.

I also find the small changes in the cythonized files annoying. In principle, anyone using develop can be expected to install cython; as @hainm points out, it's not a big deal anymore.

I would be ok with building the cython files as part of release. They should also end up on master, don't you think?

It will change our "minimal builds" in that they are not THAT minimal anymore.

Oliver

@mnmelo
Copy link
Member Author

mnmelo commented Jan 28, 2016

@orbeckst, yup: master and releases makes sense.

This will also make cythonized files more consistent. I just opened a PR (#678) where I change code in one of the .pyx files. I'm also committing the respective cythonized .c file. Only now its Cython metadata block differs from all other cythonized files.

@kain88-de
Copy link
Member

yes having them in the release tarball is very useful. I wouldn't want to give that up. It makes sure that we have some control over the cython generated files

@hainm
Copy link
Contributor

hainm commented Jan 29, 2016

@orbeckst if mdanalysis can be released from conda build, you don't even care about cython. 👍
(install should take only 10s for user)

@jbarnoud
Copy link
Contributor

@hainm Not everybody uses Anaconda, and we should be careful not making it more required than needed. I am very happy with pip and virtualenv.

@hainm
Copy link
Contributor

hainm commented Jan 29, 2016

Hi, I dont say users need to install anaconda. Just an option.

If you are using pip, you don't need to bundle cythonized files too. :d

Hai

On Jan 29, 2016, at 3:43 AM, Jonathan Barnoud notifications@github.com wrote:

@hainm Not everybody uses Anaconda, and we should be careful not making it more required than needed. I am very happy with pip and virtualenv.


Reply to this email directly or view it on GitHub.

@kain88-de
Copy link
Member

If we use pip we have to bundle them. Otherwise the pip install will fail. It shouldn't if we provide wheels but I don't know if we have them yet.

@richardjgowers
Copy link
Member

So how will this work logistically? We add *.c to the gitignore? But only on develop?

And then will there be a list of .cs I need to remember to commit into master when I do the releases?

@mnmelo
Copy link
Member Author

mnmelo commented Jan 29, 2016

I was looking through it, and apparently that's the most obvious way to go.
Another way I thought about was to have setup.py delete the .c files after install. We just have to add another flag, say, keep-cythonized that defaults to False on develop and that you'd set to True when releasing.

@jbarnoud
Copy link
Contributor

The git ignore thing could work, also we can add a file even if it match
a patern in the gitignore. We do not want to exclude all the C files,
though. Only those generated by Cython.

On 29-01-16 14:11, mnmelo wrote:

I was looking through it, and apparently that's the most obvious way
to go.
Another way I thought about was to have |setup.py| delete the |.c|
files after install. We just have to add another flag, say,
|keep-cythonized| that defaults to |False| on develop and that you'd
set to |True| when releasing.


Reply to this email directly or view it on GitHub
#667 (comment).

@kain88-de
Copy link
Member

And then will there be a list of .cs I need to remember to commit into master when I do the releases?

If you do a merge develop into master yes. Then that would be it. A rebase workflow would be hard and involve force pushing.

git ignore thing could work

I don't think it works when we keep the files in VC though. They are tracked in another branch but I'm not sure. Someone would need to test that.

@mnmelo
Copy link
Member Author

mnmelo commented Feb 1, 2016

From some googling, maintaining this sort of .gitignore is error-prone and, frankly, inelegant.
I was able to modify setup.py to delete any generated Cython files. I think this is more flexible. I'll post a PR and you'll let me know what you think.

Also, from Cython's docs:

It is strongly recommended that you distribute the generated .c files as well as your Cython sources, so that users can install your module without needing to have Cython available.

It is also recommended that Cython compilation not be enabled by default in the version you distribute. Even if the user has Cython installed, he/she probably doesn’t want to use it just to install your module. Also, the installed version may not be the same one you used, and may not compile your sources correctly.

So I guess we should really ship with Cython disabled, to prevent hard-to-find bugs on the user side.

@richardjgowers
Copy link
Member

Hmm ok. Would it be possible to put some magic in setup.py along the lines of:

if version.endswith('-dev0'):
    use_cython = True
else:
    use_cython = False

To simplify everything for the poor suffering release manager

@mnmelo
Copy link
Member Author

mnmelo commented Feb 1, 2016

I was looking into how to make all this 'just work' for both devs and release manager. I wouldn't want to have to change anything on the installing user's side (like having to pass flags to setup.py or environment variables to pip).

I think @richardjgowers idea is good, if we stick to the nomenclature. I'll give it a try. (BTW, is that always going to be a '-dev0'? Can it ever become a '-dev1', for instance?)

@hainm
Copy link
Contributor

hainm commented Feb 1, 2016

may be

is_release = True

if is_release:
    # use *.c files
else:
    # use *pyx files

@mnmelo
Copy link
Member Author

mnmelo commented Feb 1, 2016

Yup, something along those lines. The thing is that is_release should be False during the development phase, which forces @richardjgowers to also having to change this to True for every release.
By coupling it to the version string we kill two birds with one stone.

@hainm
Copy link
Contributor

hainm commented Feb 1, 2016

The thing is that is_release should be False during the development

yeah

@richardjgowers
Copy link
Member

I don't think this has actually answered the original issue which was the .c files were annoying. I'm not sure why these get regenerated sometimes (when they've not been touched). But if we're distributing them with the release (as per cython docs above) it sounds like they need to be in version control.

@mnmelo
Copy link
Member Author

mnmelo commented Feb 2, 2016

I disagree. They're compiled stuff, just like the *.so files and documentation. The only reason to ship them compiled into .c files is to remove that burden from the end user.

mnmelo added a commit that referenced this issue Feb 2, 2016
Default behavior now depends on release or dev status. When in dev mode
default is to use Cython and delete cythonized files. When in release
mode default is to skip Cython but keep existing cythonized files.

setup.py now reads the version string directly from version.py.

Environment variable handling in setup.py enhanced to recognize strings
with boolean meaning.
@mnmelo
Copy link
Member Author

mnmelo commented Feb 2, 2016

I just opened a PR (#692) with my solution and the ideas of @richardjgowers and @hainm. Do give opinions.

Developers don't need to do anything special. Since now the dev branch has no cythonized .c files you'll get an error (about a missing .c source file) if you a) don't have Cython; b) have Cython version < 0.16; c) specifically disable Cython usage through setup.cfg or the MDA_USE_CYTHON=0 environment variable.

On dev builds cythonized files are automatically removed after setup.py finishes building. Again, use setup.cfg or MDA_KEEP_CYTHONIZED=1 to instruct setup.py to keep the files (if you do this, do not commit the cythonized files back into develop).

On release builds the defaults are reversed. Cython won't be used. If it's use is forced, cythonized files won't be deleted (a user can force the use of their own Cython, even when installing via pip, by simply setting MDA_USE_CYTHON=1, although, since the .c files will be there, I'm not sure recompilation will always kick in).

mnmelo added a commit that referenced this issue Feb 8, 2016
Default behavior now depends on release or dev status. When in dev mode
default is to use Cython and delete cythonized files. When in release
mode default is to skip Cython but keep existing cythonized files.

setup.py now reads the version string directly from version.py.

Environment variable handling in setup.py enhanced to recognize strings
with boolean meaning.
mnmelo added a commit that referenced this issue Feb 8, 2016
Default behavior now depends on release or dev status. When in dev mode
default is to use Cython and delete cythonized files. When in release
mode default is to skip Cython but keep existing cythonized files.

setup.py now reads the version string directly from version.py.

Environment variable handling in setup.py enhanced to recognize strings
with boolean meaning.
mnmelo added a commit that referenced this issue Feb 9, 2016
Default behavior now depends on release or dev status. When in dev mode
default is to use Cython and delete cythonized files. When in release
mode default is to skip Cython but keep existing cythonized files.

setup.py now reads the version string directly from version.py.

Environment variable handling in setup.py enhanced to recognize strings
with boolean meaning (fixes #629).

Travis' minimal build now also rebuilds cythonized files.
richardjgowers added a commit that referenced this issue Feb 9, 2016
Cythonized files now deleted after setup. (closes #667 / #629)
@richardjgowers richardjgowers added this to the 0.14 milestone Feb 9, 2016
@mnmelo
Copy link
Member Author

mnmelo commented Feb 9, 2016

Ok. This is merged. Thanks @richardjgowers. Please keep giving feedback on the release process, in case we should tweak or entirely revert this change. Don't forget you'll have to run setup.py with MDA_USE_CYTHON and MDA_KEEP_CYTHONIZED both set to true to get your .c files and keep them too.

I'm schematizing below what I imagine would be a good way to keep release-specific stuff separated from the develop branch, in case we ever worry about devs branching off that one commit with non-dev version string. The downside of this approach is that the merging of develop into master won't be fast-forwardable because the version string changes will conflict (which is, of course, small, but I don't know how much extra release burden this would be).

                 RELEASE TAG                          NEW RELEASE TAG
                      |                                      |
          (version string update,                (version string update,
             cythonization, and                     cythonization, and
       other release-specific actions)        other release-specific actions)
                      |                                      |
master --------o------o-------------------------------o------o----- ·  ·  · 
              /                                      /
             /          (next release development)  /
develop ----o---------o----- ·  ·  ·  ·  ·  · -----o---------o----- ·  ·  · 
            |         |                            |         |
    (last dev commit) |                    (last dev commit) |
           (dev version string update)            (dev version string update)

@richardjgowers richardjgowers mentioned this issue Feb 29, 2016
4 tasks
@mnmelo
Copy link
Member Author

mnmelo commented Sep 2, 2016

As posted by @kain88-de in the dev mailing list, the fact that we remove cython files after each build makes rebuilding annoying, because everything is recompiled even if no relevant code has changed.

Looking back on the ideas in this thread, an alternative possibility would be to untrack cython-generated .c files in .gitignore. This is not straightforward to do because we have non-cython .c files we do want to track.

Perhaps we could have some rules about cythonized file naming that would make a regex easier to apply in .gitignore, but I guess this is what I meant above when I said things would get "inelegant"...

Suggestions?

@mnmelo mnmelo reopened this Sep 2, 2016
@mnmelo mnmelo added the question label Sep 2, 2016
@kain88-de
Copy link
Member

@mnmelo can't we just add an option like DONT_DELETE_CYTHON to the config and then skip the removing if that is true? then I can build with DONT_DELETE_CYTHON=1 python setup.py build_ext --internal.

@mnmelo
Copy link
Member Author

mnmelo commented Sep 2, 2016

That already exists, as part of what was implemented from this very issue, doesn't it?

@mnmelo
Copy link
Member Author

mnmelo commented Sep 2, 2016

Yup, just tested it:

MDA_KEEP_CYTHONIZED=1 ./setup.py build

But then you'll bump into the problem of having to untrack the .c files yourself.

@mnmelo
Copy link
Member Author

mnmelo commented Sep 2, 2016

One thing I just noticed is that all our real (not cythonized) .c source files all live in /src/subdirs, whereas cython-generated .c files live in the same directory as their corresponding .pyx.

Would a .gitignore based on this organization be something worth considering?

@kain88-de
Copy link
Member

@mnmelo I'm cool with keeping the track of the 6 c files myself. magit has a wonderful interface for that. I don't think it's worth spending to much time one this. Just having one option to avoid removing the files is enough.

@mnmelo
Copy link
Member Author

mnmelo commented Sep 2, 2016

Ok, reclosing as it seems to be ok as it is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants