-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add binder support for deep review contributions figure #161
Conversation
The kernel not found seems like an error that would occur when nb_conda_kernels does not exist. However, we do install that:
In what kernel or environment are you getting the tornado error? |
I had been getting that error in the In binder, the kernel error pops up at launch but then I can select the
It's likely related to jupyter/docker-stacks#502 |
Success! I can now run this notebook in binder: https://mybinder.org/v2/gh/agitter/meta-review/binder?filepath=analyses/deep-review-contrib/02.contrib-viz.ipynb Installing tzdata with apt-get fixed the biggest remaining issue. Everything else was conda related. Upgrading r-dplyr created inconsistencies with most (all?) of the other R packaged. I ended up upgrading all of them to a combination I found to work locally. The difficultly in recreating a conda environment that worked a year ago suggests that continuous testing of environments is important if we want to rely on them for notebooks like this, with or without binder. We may also want to dump There is still an error message about not finding the matching R kernel. Selecting @dhimmel what do you think? Should I proceed and add a link to the manuscript and some documentation to the binder build? I'm not good with ggplot so I'm not going to attempt the dot plot discussed in #149. |
This has something to do with the kernelspec. In the notebook it is specified as meta-review/analyses/deep-review-contrib/02.contrib-viz.ipynb Lines 818 to 824 in 1bb5c48
The kernel binder recognizes is |
Yes it's a bit unnerving that despite our efforts to specify the environment, it takes hours to debug issues that have arisen in the last year. It suggests our conda environments do not do a good job of preserving environments across systems and time. It may be that we find more complete environment images are required.
In the figure notebook? Yea, not a bad idea.
Yeah sounds reasonable. |
When I download the notebook that is launched via binder (after selecting the R kernel), the .ipynb file includes: "kernelspec": {
"display_name": "R [conda env:conda]",
"language": "R",
"name": "conda-env-conda-r"
}, Therefore, if we use these details, I think binder will automatically open the proper kernel. Too bad there seems to be an environment naming discrepancy between my local system and binder. I'd say we should use the binder names to make the binder workflow least problematic. |
Experiences like this are what make me cringe whenever folks suggest that Conda is the answer for what ails us with respect to reproducibility in bioinformatics. 😬 |
To be fair, Docker only enables reproducibility in a very narrow sense. While there will exist an image that can recreate the output, if you want to make enhancements that require additional software, then you must recreate the image and potentially face issues like we're facing with conda. So in essence, Docker enables exact reproducibility but not necessarily much more, before it experiences the same problems. |
@dhimmel : I don't disagree with that either. 😁 |
I should clarify that I cringe when Docker is suggested as the magic bullet too. |
I'll proceed to prepare this pull request for review.
In df20be8 I tested updating the notebook kernelspec to match what you reported above. That will at least reveal whether we can get this working in binder without the kernel error. Then post-merge I'll follow up with the binder team to see what a better solution would be for dynamically fixing the kernelspec. I don't want to cause this same kernel problem in the local environment. Given what we've seen with the conda environment, I'll create a new tag for this repository once we have a stable version. Maybe something like |
@agitter is there a link with documentation on how binder caches Docker images for a repo? I think ideally we want the repository URL to point to a binder with a cached environment. However, we would like the ability to update this environment in case it breaks, such that the URL in the manuscript does not become stale. What happens if we just use the latest |
@@ -1,6 +1,6 @@ | |||
packages = c( | |||
'https://cran.r-project.org/src/contrib/svglite_1.2.1.tar.gz', | |||
'https://cran.r-project.org/src/contrib/ggridges_0.4.1.tar.gz' | |||
'https://cran.r-project.org/src/contrib/Archive/ggridges/ggridges_0.4.1.tar.gz' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kurtwheeler is there a URL structure for versioned R dependencies that supports both the latest version and archived versions? IIRC you had to deal with this at some point, and there was an undocumented workaround.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is not such a URL structure
But I remember there being some undocumented way to retrieve it, like with ?version=X.X.X
appended to the URL or something. However, devtools::install_version
doesn't seem to use this method I'm thinking of.
Cross-referencing this tweet (unresolved).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like remotes::install_version
is more lightweight of a dependency than devtools
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kurtwheeler didn't you get the answer at the mailing list here? For example, the following URLs should work?
https://cran.r-project.org/package=svglite&version=1.2.1
https://cran.r-project.org/package=ggridges&version=0.4.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agitter can you see if those two URLs work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replied to a stackoverflow with the undocumented method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing this in 8aff589. I also changed other parts of the build so I may need to stabilize those before I can tell whether this worked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I watched the build logs and these URLs appeared to work. As expected, I still need to fix other parts of the build before I can test the packages are functional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Installing the R packages with these CRAN URLs worked.
@dhimmel this page https://mybinder.readthedocs.io/en/latest/tutorials/reproducibility.html states:
If you'd like more technical information, I may need to open an issue to ask the development team. I haven't found more details. You made me realize that my suggestion of making a We could use the latest What about creating a For my own notekeeping and debugging, I also used
|
@slochower yes that's possible. I'm having a hard time debugging the kernel issues. I've been pushing a few small changes to trigger new builds so I can watch the build process. If you happened to load one of those new commits it would be very slow. That said, even a cached image can take minutes to load. |
Got it. I wonder whether this will get shorter over time. |
It looks like
That is because the environment file is being used to update the existing environment, not create a new environment: |
Axe it? |
It was not easy to create and activate a new conda environment, and I decided to not debug it further. Now that I understand the build system better, updating the notebook's kernelspec is the best option presented in jupyterhub/mybinder.org-user-guide#152 Removing This is functioning and ready for review. For now it uses the master branch. We'll need to finalize our tagging strategy before merging or create a new issue to do so post-merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we probably should update master
to binder
(for a binder tag) prior to merging.
@@ -9,3 +9,5 @@ conda env create --file environment.yml | |||
source activate contrib-viz | |||
Rscript install-manual-r-dependencies.r | |||
``` | |||
|
|||
The environment was updated to be compatible with Binder after the notebooks were originally run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to rerun the notebooks and save with the current environment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was intended as a temporary comment that we can remove after rerunning the notebooks in the new environment. However, I'm out of the office today and may not trust that the environment works perfectly on this Windows laptop.
I also thought that we might add a dot plot for #149 and then run the notebook one final time after that update. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also thought that we might add a dot plot for #149 and then run the notebook one final time after that update. What do you think?
Yesterday I quickly added geom_dot()
in place in the figure code, but the dots didn't align correctly with the filled region. geom_dot()
without the lines worked, but the combination was wonky. I don't have an example to show because I was just messing around in Binder. Perhaps @dhimmel can easily figure out what's up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm quite bad with R plotting so I'm not planning to make an attempt at this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also thought that we might add a dot plot for #149 and then run the notebook one final time after that update. What do you think?
I'm happy to take a crack at this visualization. I suggest we merge this PR first so I can work off of the updated environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gg = total_words_df %>%
ggplot2::ggplot(ggplot2::aes(x = date, y = username, height = total_words, group = username, fill=username, color=username)) +
ggplot2::geom_dotplot(binaxis = "y", dotsize = 0.1) +
# ggridges::geom_density_ridges(stat='identity', size=0.4, alpha=0.3) +
ggplot2::scale_x_datetime(expand = c(0, 0), date_labels = '%b %Y', breaks = major_breaks, minor_breaks = minor_breaks) +
ggplot2::scale_y_discrete(limits=usernames, position = 'right', labels=y_labels) +
#ggplot2::scale_fill_manual(values = setNames(usage_df$color, usage_df$username)) +
ggplot2::guides(fill=FALSE, color=FALSE) +
ggplot2::xlab(NULL) +
ggplot2::ylab(NULL) +
ggplot2::theme_minimal() +
ggplot2::theme(axis.text.y = ggplot2::element_text(vjust = 0)) +
ggplot2::theme(panel.grid.major.y = ggplot2::element_blank())
This naively looks correct, but dot plot starts around Jul 2017 instead of Jan 2017.
Binning in the other direction looks correct, but it's not really the "timeline" view that the reviewer requested.
gg = total_words_df %>%
ggplot2::ggplot(ggplot2::aes(x = date, y = username, height = total_words, group = username, fill=username, color=username)) +
ggplot2::geom_dotplot(binaxis = "x", dotsize = 0.2) +
ggridges::geom_density_ridges(stat='identity', size=0.4, alpha=0.3) +
ggplot2::scale_x_datetime(expand = c(0, 0), date_labels = '%b %Y', breaks = major_breaks, minor_breaks = minor_breaks) +
ggplot2::scale_y_discrete(limits=usernames, position = 'right', labels=y_labels) +
ggplot2::guides(fill=FALSE, color=FALSE) +
ggplot2::xlab(NULL) +
ggplot2::ylab(NULL) +
ggplot2::theme_minimal() +
ggplot2::theme(axis.text.y = ggplot2::element_text(vjust = 0)) +
ggplot2::theme(panel.grid.major.y = ggplot2::element_blank())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dhimmel great! See a few examples above...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@slochower Failed to connect to event stream
is not a fatal error in my recent experience. It indicates an interruption between the build process and the local client that prevents the log from updating in the browser, but the build continues. Usually refreshing will either reset the build log output (from its current point, not the beginning) or launch the container if it finished building.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh. A refresh launched a rebuild, which did work, but it took another 5-10 minutes, so I don't think it simply reset the output. Good to know, though!
content/02.main-text.md
Outdated
@@ -97,6 +97,7 @@ The Deep Review was initiated in August 2016, and the first complete manuscript | |||
While the article was under review, we continued to maintain the project and accepted new contributions. | |||
The preprint was updated in January 2018, and the article was accepted by the journal in March 2018 [@doi:10.1098/rsif.2017.0387]. | |||
As of June 15, 2018, the Deep Review repository accumulated {{total_commits}} git commits, {{merged_pull_requests}} merged pull requests, {{open_issues + closed_issues}} issues, and {{github_stars}} GitHub stars. | |||
[Click here](https://mybinder.org/v2/gh/greenelab/meta-review/master?filepath=analyses/deep-review-contrib/02.contrib-viz.ipynb) to launch Binder and interactively explore alternative visualizations in the Jupyter notebook that generated this figure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative phrasing that is not necessarily better than the existing phrasing (so feel free to disregard):
The notebooks to generate this figure can be interactively launched using Binder, enabling users to explore alternative visualizations or analyses of the source data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have confirmation in jupyterhub/mybinder.org-user-guide#153 that the |
Doesn't matter. I might just wait till merge. |
I'm merging and then will add the |
This build is based on 313023e. This commit was created by the following Travis CI build and job: https://travis-ci.org/greenelab/meta-review/builds/502263340 https://travis-ci.org/greenelab/meta-review/jobs/502263341 [ci skip] The full commit message that triggered this build is copied below: Add binder support for deep review contributions figure (#161) * Add symbolic link to notebook conda environment * Add postBuild binder file for additional R packages * Fix postBuild script path * Fix postBuild script path again * Update ggridges CRAN URL * Specify tornado version * Update r-dplyr * Install tzdata with apt-get * Upgrade r-readr for newer r-dplyr * Upgrade more R packages * Update more R packages * Test updating notebook kernelspec * Add note about environment version * Document binder files * Dump kernelspec for debugging * Add Binder link to figure caption * Kernel debugging * Persistent log for debugging * Explicitly create and activate contrib-viz environment * CRAN URLs to support current and archive versions per dhimmel * Enable conda activate * Revert notebook kernelspec change * Remove attempts to create new conda environment * Use postBuild script to fix kernelspec in notebook * Expand Binder readme * Consistent capitalization * Remove accidental text * Update caption text and switch to binder tag in Binder link * Describe binder tag in readme * Add Binder paper citation
This build is based on 313023e. This commit was created by the following Travis CI build and job: https://travis-ci.org/greenelab/meta-review/builds/502263340 https://travis-ci.org/greenelab/meta-review/jobs/502263341 [ci skip] The full commit message that triggered this build is copied below: Add binder support for deep review contributions figure (#161) * Add symbolic link to notebook conda environment * Add postBuild binder file for additional R packages * Fix postBuild script path * Fix postBuild script path again * Update ggridges CRAN URL * Specify tornado version * Update r-dplyr * Install tzdata with apt-get * Upgrade r-readr for newer r-dplyr * Upgrade more R packages * Update more R packages * Test updating notebook kernelspec * Add note about environment version * Document binder files * Dump kernelspec for debugging * Add Binder link to figure caption * Kernel debugging * Persistent log for debugging * Explicitly create and activate contrib-viz environment * CRAN URLs to support current and archive versions per dhimmel * Enable conda activate * Revert notebook kernelspec change * Remove attempts to create new conda environment * Use postBuild script to fix kernelspec in notebook * Expand Binder readme * Consistent capitalization * Remove accidental text * Update caption text and switch to binder tag in Binder link * Describe binder tag in readme * Add Binder paper citation
This follows up on our binder discussion in #149. To avoid duplicating the conda environment file for the notebooks, I used a symlink from the
binder
directory. That and thepostBuild
file appear to work.However, the notebook is not yet executable in binder. You can test it and see the errors in my fork: https://mybinder.org/v2/gh/agitter/meta-review/binder?filepath=analyses/deep-review-contrib/02.contrib-viz.ipynb
Jupyter throws a kernel error
Stack trace
I tried creating this environment locally and had similar problems. My current guess is that it is related to the
tornado
package version along the lines of ContinuumIO/anaconda-issues#8789 The environment was added in 51f4c91 on 2018-01-24. conda-forge hadtornado-4.5.3
at that time and upgraded to version 5 shortly thereafter.@dhimmel are you still able to locally create the
contrib-viz
conda environment? Or if you have an existing version that works can you please report all of the package versions so we can debug the Jupyter kernel problem? This will be a general issue for anyone who wants to execute these figures irrespective of binder.Note also that I had to update the CRAN URL to use the archive URL.