Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

binder.yaml - One Build File to Rule Them All? #42

Open
psychemedia opened this issue Dec 3, 2017 · 10 comments
Open

binder.yaml - One Build File to Rule Them All? #42

psychemedia opened this issue Dec 3, 2017 · 10 comments

Comments

@psychemedia
Copy link

psychemedia commented Dec 3, 2017

I'm putting together a series of binder executed repos that are composed of particular components with their own dependencies.

At one level, this might be a quite simple dependency, such as a Python package installed by pip that requires a particular Linux package to be installed via apt.txt first.

Or it might be more elaborate - for example a pseudo collection of packages to "implement" a particular sort of functionality, such as a range of tools too handle map display.

To try to ease the pain of remembering what Linux package is a pre-requisite of which python package, it might be useful if we could bundle the requirements as pseudo/self-named collections.

For example, I have three "topics" in the following build (the build may not be sensible!): some map handling stuff (maps), some astronomical stuff (astro-stuff) and some notebook presentation stuff (presentation).

maps:
  - apt
    - libproj-dev
    - libgeos-dev
  - conda
    - channels
      - conda-forge
    - dependencies
      - python
      - matplotlib
      - basemap
    - pip
      - folium

astro-stuff:
  - conda
    - channels  
      - astropy
    - dependencies
      - python
      - astropy

presentation:
  - pip:
    - jupyter_contrib_nbextensions
    - RISE
    - jupyter-wysiwyg
  - postBuild
    - jupyter contrib nbextension install --user
    - jupyter nbextension enable widgetsnbextension  --py --user
    - jupyter nbextension enable python-markdown/main  --py --user
    - jupyter-nbextension install rise --py --user
    - jupyter-nbextension enable rise  --py --user 
    - jupyter-nbextension install jupyter_wysiwyg --py --user
    - jupyter nbextension enable jupyter_wysiwyg  --py --user 

What I imagine happening is that binder.yaml parser could take such a file and generate separate apt.txt, environment.yml or requirements.txt, and postBuild files representing the set intersection of the various build components. (I guess there could be version conflicts in there.)

For the user, it would mean they have just a single config file to worry about, plus they can reuse components that represent complex builds (the Linux package you always forget is a dependency, the post-build enabling of an extension, etc).

Looking at the above file, I could also imagine further levels of nesting, eg in terms of jupyter-nbextension install or jupyter-nbextension enable, though adding too many hard-wired paths to particular sorts of operation may reduce the overall utility of a binder.yaml parser?

@yuvipanda
Copy link
Contributor

I think this would be useful for specific advanced use cases, but we have to be very careful to not become yet another package specification format (https://xkcd.com/927/ etc). We also don't want to make a format that only works on binder, and can't be used locally / by other upstream tools. I personally think just using the currently supported set of files and commenting heavily is a better fit for this use case, than inventing a new format. For example, there's a much larger community of users who can help you debug a problem with an environment.yml file than would be for a binder.yaml file.

A possible way to go about this would perhaps be to write a small python package that takes such a file, and just spits out the different files that are recognized by binder. I would recommend that the YAML file not invent any new syntax - for example, the contents of 'conda' should just be an environment.yml file and not something bespoke. Same for requirements.txt, or apt.txt, etc. This new python package / tool would only do the job of merging all the various requirements into one that makes sense in some form and then spitting them out as separate files. Am not entirely convinced this is doable easily in a way that's not confusing or has a lot of edge cases, though.

Once this small python package exists, users can incorporate that into their workflows (perhaps as a git pre-commit hook?) if they want, and if there are enough users for this we can incorporate that into the default set of things binder supports.

IMO the specific set of solutions to this is to wait for (and help!) the packaging ecosystem to get better. For pip packages that need a specific apt package, binary wheels! For notebook extensions needing lines in postBuild, jupyter/notebook#2894 and related issues are being worked on that remove that requirement which is the right way to do this.

Hope that makes sense! And thank you very much for opening an issue here!

@psychemedia
Copy link
Author

psychemedia commented Dec 4, 2017

@yuvipanda Yes - makes sense. Re: the comments - yep, that's one really useful way, (though I noticed recently that if I had comments or blank lines in apt.txt it threw an error?)

Agreed on the risk of exploding numbers of format, and also better packaging elsewhere.

But I am also mindful of folk who are users rather than developers / sys admins, and who struggle with trying to debug why a particular installation is failing because something is missing and for which the documentation might as well be written in Linear B, if it isn't already...

Binder is a huge step forward in bringing computing to folk who may be able to benefit from using notebooks. Not only does it save time in making builds shareable and reusable, it also helps folk whose computational skills might be limited to using widgets provided as part of a notebook, or at best copying a line of code and changing the filename in it. Which is to say, folk who don't know how to find and import packages, let alone install them.

And it would be good if there were simple building blocks they could copy on the road to developing their own builds and combinations.

(I am trying to find ways that lower the barrier of entry way down!)

That said, I'm also mindful that the technology also needs to remain useful for experts.

@yuvipanda
Copy link
Contributor

Yup, jupyterhub/repo2docker#151 fixed the 'comments in apt.txt cause crashes' issue.

Agree re: lowering barriers of entry. http://github.com/binder-examples/appmode goes a long way towards that I think!

@yuvipanda
Copy link
Contributor

Can you expand a bit on how having this would be useful for people who 'folk who don't know how to find and import packages, let alone install them.'? Am curious, since I also want to support such people but not sure how exactly a binder.yaml file would help here. A use case or example might help me understand!

@choldgraf
Copy link
Member

This is an important topic, thanks for bringing it up! I agree that the bar should be really really high for "create a new way of doing things" vs. "utilize pre-existing ways to do things". E.g., our decision to create a new repo (repo2docker) was not taken lightly since other building packages existed already. The decision was important enough that Yuvi even wrote a blog post about it!

IMO if you think this would be useful for your situation, make it into a standalone package that could complement the binder ecosystem. I've found that's a good way to test out functionality / make future decisions about API / etc. It'll also be a good way to see if it serves a useful role for the rest of the community!

@psychemedia
Copy link
Author

@choldgraf I will try to doodle some sketches around this... but note that I also class myself as more "hopeful technologist" than "developer" ;-)

Reflecting again on this just now, lots of install instructions say things like:

conda install -c astropy pyephem 
conda install -c astropy/label/openastronomy pyephem

or

pip install folium
pip install git+https://github.com/python-visualization/folium.git
pip3 install ipywidgets~=7.0.0b1

which for a novice who gets that far may just lead to confusion as to what to do next?

In which case they might want just want a bash file, or a yaml file with a parser that can try to parse untold horrors in the way that HTML parsers do:

maps:
  - apt
    - libproj-dev
    - libgeos-dev
  - pip install folium
  - conda
    - channels
      - conda-forge
    - dependencies
      - python
      - matplotlib
      - basemap

astro-stuff:
  - conda install -c astropy/label/openastronomy pyephem

presentation:
  - pip:
    - jupyter_contrib_nbextensions
    - RISE
    - jupyter-wysiwyg

The emphasis then becomes one of having a parser that tries to get the user up and running whatever the user throws at it? Because for the user, they just want to use the package?

@choldgraf
Copy link
Member

I can definitely see how someone could find that workflow useful! Give it a go any feel free to ping if you have a prototype that works for your needs! We are all hopeful technologists in the open-source world :-)

@takluyver
Copy link
Member

I'd be wary of the 'handle anything thrown at us' model: while it makes it easier to dump a load of stuff into a file and hope it works, it makes it harder to actually understand how to specify stuff, because there are many different ways to say the same thing.

Maybe an alternative would be to make a tool someone can quickly run inside a repo, to tell you what binder would install based on that repo, and point out any problems.

@betatim
Copy link
Member

betatim commented Dec 5, 2017

To chime in on the discussion of "how to make it easier to translate install commands from docs to binder setup stuff": a webpage/tool that has a text field into which you can paste "setup instructions" you find in other packages/code and that generates apt.txt, requirements.txt, environment.yml`, etc for you would be cool.

You'd paste:

conda install -c astropy pyephem 
conda install -c astropy/label/openastronomy pyephem
pip install folium
pip install git+https://github.com/python-visualization/folium.git
pip3 install ipywidgets~=7.0.0b1

and the tool parses that to work out what needs to go where. I think this would be useful for people who don't know what packages are, who conda is and what behaviour is apt. It also sounds like a pretty hard bit of code to write, parsing conda commands and pip options :-/

Maybe a easier thing to write is a webpage that has the building blocks that @psychemedia mentions, let's me compose them "maps, I want map stuff, and machine-learning also looks useful", and spits out the various files needed.

Final thought: as someone who is not very experienced at all with all this packaging, will I manage to edit the GitHub repository to add these files? Maybe not. Maybe we need a few user personas that we can target.

@psychemedia
Copy link
Author

Another approach - and the one I've been exploring - that might relate to identifying thematic personas is to extend the binder-examples model to build example topic based / thematic distributions. Eg in the branches of https://github.com/psychemedia/showntell I have builds for things like "chemistry" or "music" that contain packages (and ultimately demos of how to get started using them) related to those topic areas.

(My particular interest in this is helping folk write "reproducible" and extensible open education course materials.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants