Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Google Colab to load ipynb examples directly from documentation? #57

Closed
sroet opened this issue Jun 4, 2019 · 11 comments
Closed

Use Google Colab to load ipynb examples directly from documentation? #57

sroet opened this issue Jun 4, 2019 · 11 comments

Comments

@sroet
Copy link
Collaborator

sroet commented Jun 4, 2019

Looking at this link.
Together with !pip install contact-map jupyter magic It should be feasible to allow people to run our examples from the website directly in their browser, without any download required.
I think this would be a good idea to implement in our read the docs template for the examples (especially the badge mentioned in the link).

@dwhswenson what do you think?

@dwhswenson
Copy link
Owner

Sounds worth trying out, although there a couple challenges. The big one is that we don't include all the trajectory files in the repository, so you'll need a way to download data files (maybe there are magics to help with that, though?) If you look at the OPS additional examples, I have a (bash) script to download and install from figshare. Requires curl and unzip:
https://gitlab.e-cam2020.eu/dwhswenson/ops_additional_examples/blob/master/devtools/figshare_dl_extract.sh

Also, our install isn't quite that easy (requires numpy and cython separately, because they're required for the build phase of MDTraj). That might be straightforward, although numpy often makes things difficult, especially if the environment already has numpy installed. Of course, if there's a way to use conda instead, then it is a lot easier.

@sroet
Copy link
Collaborator Author

sroet commented Jun 4, 2019

Also, our install isn't quite that easy (requires numpy and cython separately, because they're required for the build phase of MDTraj). That might be straightforward, although numpy often makes things difficult, especially if the environment already has numpy installed. Of course, if there's a way to use conda instead, then it is a lot easier.

(sorry for formatting, as I am on my phone)
Actually, I just tested just running the pip install command and it just works (due to pep 517 I guess). Will look at the the other issues later this week

@sroet
Copy link
Collaborator Author

sroet commented Jun 6, 2019

Current status: https://colab.research.google.com/drive/1YSzxUtk_fVLhVNiLcJcBoJuc9g261RS-

  • Install via pip works (as google colab has latest numpy version and cython for both pytnon 2.7/3.6)
    (This takes about 2-3 minutes (building mdtraj), and might require a slight warning)
  • Curl and unzip also works :D

Will see if I can alter the notebooks this week/ early next week.

Question: do we want to host specific colab-versions of the ipynb's (maybe build on the fly by CI as we just have to add 1 cell to the start of the ipynb) or do we want to add the install and curl/unzip code to all our example notebooks?

@dwhswenson
Copy link
Owner

Glad to see that it (basically) works! I mean, you should definitely check that everything runs -- weird linking problems might only show up at runtime.

do we want to host specific colab-versions of the ipynb's [...] or do we want to add the install and curl/unzip code to all our example notebooks?

This is a good question. Let me list what I see as the pros/cons of each:

Add install and download code to the notebooks

Pro:

  • easy
  • having the download step included will make it easy to make the notebooks testable (e.g., using the nbval pytest plugin)

Con:

  • clutters the examples (quite a lot, actually) as they'll be seen on the documentation site
  • really annoying when doing a re-run all to clean up outputs (the pip install will be fast, since contact_map is already installed locally, but still noisy and annoying)
  • related to both of the other cons: the output that the user will see in the example is not what the user will see when they run the notebook

Build colab-specific versions on the fly

Pro:

  • mainly, avoids all the cons of the other approach
  • creates an infrastructure that enables us to add other tricks to the notebooks specifically for colab (or docs, for that matter)

Con:

  • I'm not sure where we would store these. Someone's Google Drive? Maybe we can upload them as an extra file hosted on RTD, but can colab start notebooks from arbitrary URLs? (I only see Drive and GitHub integration). I'm not in favor of adding a new commit from within the CI; it we're going to do that, it would be better to treat it as a "bot"-type contributor.
  • Creating all this infrastructure may be more effort than we want to take on.

Alternate approach

While typing this up, I thought of another possibility. What about adding the relevant information to a cell in each of the notebooks, but saving that cell as markdown? Give instructions on how to add a new code cell and then they copy-paste (it seems that colab doesn't have the ability to convert cell types; at least not that I could find). This avoids the cons of leaving that cell as runnable code, and remains easy to do. To add testing of our notebooks, we can just add the downloads to the testing script.

@dwhswenson
Copy link
Owner

Additional question: is there a way to add widgets in colab? (e.g., I think one of our examples suggesting visualizing with NGLView). I believe that NGLView manages to get itself working with Binder -- and we might consider doing something like what you're suggesting with Binder, instead of colab. This solves the installation side, at least.

@sroet
Copy link
Collaborator Author

sroet commented Jun 6, 2019

Additional question: is there a way to add widgets in colab? (e.g., I think one of our examples suggesting visualizing with NGLView). I believe that NGLView manages to get itself working with Binder -- and we might consider doing something like what you're suggesting with Binder, instead of colab. This solves the installation side, at least.

Quick answer: No custom widgets like NGLview for now, maybe later. nglviewer/nglview#816

I'm not sure where we would store these. Someone's Google Drive? Maybe we can upload them as an extra file hosted on RTD, but can colab start notebooks from arbitrary URLs? (I only see Drive and GitHub integration). I'm not in favor of adding a new commit from within the CI; it we're going to do that, it would be better to treat it as a "bot"-type contributor.

I would opt for the badge in the examples with a link to a different ipynb on this repository (that is not included with the docs).
Something like:

  • Add a directory for google-colab notebooks
  • Add the badge with the hardcoded link as a markdown cell to the example ipynbs.

I will open a MR with these implementations soon.

@dwhswenson
Copy link
Owner

  1. This means that there are two copies of the notebooks. This is not something we want to manage. Keeping them sync'd will be a nightmare.

  2. What's the reason to prefer colab over Binder? It seems like Binder does more to meet our goals, and doesn't require changes to the notebooks (we can do the install and download as part of the Binder environment setup).

@sroet
Copy link
Collaborator Author

sroet commented Jun 6, 2019

This means that there are two copies of the notebooks. This is not something we want to manage. Keeping them sync'd will be a nightmare.

fair point.

What's the reason to prefer colab over Binder? It seems like Binder does more to meet our goals, and doesn't require changes to the notebooks (we can do the install and download as part of the Binder environment setup).

Just my inexperience with binder vs colab. But you are right. That is probably the better way to go.

@dwhswenson
Copy link
Owner

From looking over it, Binder should be really easy to set up.

  1. Add a directory called binder
  2. In that directory, add a conda-style environment.yml (I would recommend against pinning versions)
  3. Add a postBuild script (this is where we download the trajectories)
  4. Get the URL and badge by filling this in: https://gke.mybinder.org/

Take a look at nglview's binder dir for inspiration.

I think we can plan on 2 kinds of badges/links:

  1. In general, you might want to just start in the examples directory. This would be kind of like typing jupyter notebook from the examples directory.
  2. You might also want to put a badge in each example with a link to that specific example. This way you'd start with the notebook you want.

Actually, I think we probably want to use the JupyterLab link format; see https://mybinder.readthedocs.io/en/latest/howto/user_interface.html to modify the URLs (basically, add lab/ to the start of the urlpath -- note: use url path, not file path!).

You can already play with the mybinder site to set up the URLs -- right now the notebooks will error if you run them, because the requirements aren't installed, but here are the kinds of URLs I was thinking:

@sroet
Copy link
Collaborator Author

sroet commented Feb 17, 2020

I have gained some experience with setting up binder (for teaching purposes (python), and a proof of principle tutorial for a compiled Fortran program) and agree it is a better solution for our needs, shall I update the title of this issue to reflect that, or shall I close this and open a new issue?

@sroet
Copy link
Collaborator Author

sroet commented Oct 27, 2020

solved by #92 and #93

@sroet sroet closed this as completed Oct 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants