-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discussion] reproducibility #170
Comments
I think this is a super important topic, especially when it comes to the publishing world. This is related to #93 though that's a more specific topic. |
I really like the idea of pinning repo2docker versions, which seem like the easiest (and maybe only?) solution to this problem. If we can guarantee that a properly prepared repo will always produce the same Dockerfile (rather than image, since we can not guarantee that) for any given version of repo2docker, I think that's good enough no? We might have to write version shims to maintain binderhub <-> repo2docker compatibility, but that seems not entirely too difficult. We could switch from passing in commandline arguments to using something more complex and versionable too if we want. |
Thinking more on this, there's three things we should try to allow users to pin:
We could / should use runtime.txt for (1), recommend pinning for (2), and make apt.yaml for (3). That's a good start I think, and gives us lots of low hanging fruit to work with... |
More thoughts on reproducibility: freeze conda build numbers as well or not |
Definitely an important discussion, but probably something we'll need to engage with the community on https://discourse.jupyter.org/ especially if at some point we need to make major upgrades to R2D (e.g. the base image?) |
Discussion issue for general topics of reproducibility and what's in and out of scope for repo2docker (and Binder).
We currently have a tension between our scientific goal of reproducibility and the maintenance goal of keeping everything up to date. We have the same issue that everyone who pursues reproducibility has, which is specifying the environment as strictly as necessary (so it's correct), but no stricter (so it stays useful). Conservative approaches are to use overly-specified environments (e.g.
pip freeze
/conda env export
), which we should make sure to support well and document for the more reproducibility-minded users.A user who wants to ensure a truly reproducible build must:
pip freeze
orconda env export
-produced environment specificationRight now, the only truly reproducible builds available on Binder are custom Dockerfiles, which is something I want fewer people to use, not more. But we currently have no answer for reproducibility with any other builders, as there is no way for users to be sufficiently strict about the environment.
The text was updated successfully, but these errors were encountered: