-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make it possible to configure the base image #487
Comments
To support a given base image we'd need a set of adjustments/adaptions. For example it could be that instead of How about having different hierarchies of buidpack classes? You'd have a I think having a second, third, n-th well defined stack of build packs for a particular base image has a higher likelihood of working than allowing people to use arbitrary base images by just switching the image named in the |
interesting idea - one thought: do we currently have a way of people to define a buildpack without merging it directly into repo2docker? I wonder if we could let people define new buildpacks and point to them as a part of a repo2docker build. Neurodocker, for example, is basically doing the same thing as repo2docker (it's a command line tool where you say "I want this packages installed" and it builds a Dockerfile with the relevant lines in there) |
We could explore how to use entrypoints to allow for plugins/extensions. I think that would be interesting both for build packs and content providers! |
I came across a couple of discussions about supporting different distros in #166 and pinning the distro/base image in #170. I think I need to clarify what we're talking about when we say "base image", because there are at least two cases.
One possible addition/expansion is supporting other versions (i.e., a use case that must have
Does repo2docker require the JupyterHub/server packages, or is this a requirement for BinderHub specifically? Must this be a hard requirement for every Dockerfile produced by
Yes, but
I agree -- and the communities that needs them can help define the requirements for the stack. The constraints that are in place today in repo2docker have still provided plenty of flexibility. |
The buildpacks idea is nice to me - I suppose that's the primary extension mechanism of r2d already anyway. re: other distributions of linux, I think it'd be tricky from a testing perspective but can definitely see the potential benefit to other groups that can't choose their OS. re: jupyterhub, I believe that repo2docker just ensures that there's a jupyter server default comment at the end https://github.com/jupyter/repo2docker/blob/694e728ffd33ef589417e82bd1988e1f8a099fa8/repo2docker/buildpacks/base.py#L145. This doesn't mean jupyterhub per se (though it installs jupyterhub by default so that it could work w/ jupyterhub if needed). (somebody correct me if I'm wrong here) |
I like https://github.com/binder-examples/rocker as a pattern we can emulate. @choldgraf does, say, neurodebian already have a set of docker images it maintains? If so we can maybe work to add a binder base image there |
@yuvipanda I think that in the short term this is a good solution - treat it as a "sort of advanced" use-case but provide docs to show how it's done. Then if it's done often enough and in a repeatable way, consider how to build it into a non-Dockerfile-based pattern. WDYT? |
This is a great example to illustrate one of the Whole Tale project primary use cases. For context, in Whole Tale, we'd like to use Ideally, we'd be able to have a base RStudio image such as The rocker example is great, and in essence we want to support every RStudio user this way with standard buildpack support. |
As discussed in whole-tale/whole-tale#52, I've started integration of repo2docker into the Whole Tale system as a primary image build mechanism. In doing so, I now have a clearer idea of how the Rocker images fit into this discussion and provide a good example of the potential for this capability. I've written up some notes in a Google doc for comment based largely on this thread: I've hacked together a template-based proof-of-concept for discussion, if interested: |
Chiming in here from the Pangeo perspective. We've recently found ourselves working around a few repo2docker challenges where configuring the base image would be really helpful. A few examples of what we want to do:
We have recently been trying out two approaches that touch on these points:
I'll throw out a concept for how repo2docker could handle these use cases better.
cc @rabernat, @betatim, and @fmaussion who joined in on the gitter chat this morning. |
One fundamental question is the one of who should get to choose which base image to use: the repository (via a config file) or the entity invoking I like the idea of restricting which base images you can use. The motivation for this is to allow base images to be configurable and to keep the convenience of repo2docker based image building when using them. Instead of requiring users to write (short) If we allow arbitrary base images we'd not gain much IMHO as users who choose (say) an alpine linux base image would be back to square one in terms of complexity to understand why that base image doesn't work. I like the idea of restricting the set of possible base images to images built by repo2docker. It would alleviate a lot of worries I have about how allowing this would not make users lives easier because of hard to debug incompatibilities. How would we determine if an image was built by repo2docker? Not sure, but maybe we can check the Overall I like the idea. I think that this proposal doesn't let WholeTale do what they want to do which is start from arbitrary base images that were explicitly not constructed by repo2docker (e.g. the rocker images). At the minimum we'd have to add labels to the rocker image to certify it as repo2docker-base-image-compatible. |
Great discussion! Based on the earlier discussions, the approach I'm taking with a Whole Tale proof-of-concept is to add a
I can't imagine this would be a problem, but would require the maintainers to buy-in. Although very minor, there are differences in the base template required by Debian that I'd also need to address somehow.
For our proof-of-concept, the user selects the default "environment" (WT terms) which equates to selecting the default buildpack for their repo. I did initially implement is as a flag on I'm actively working on the WT side of things now and will return to |
Would people like to see a prototype conda-buildpack that implements some of these ideas? I think we can knock that out in the next few weeks and be ready to discuss by the next monthly jupyter team meeting. |
@jhamman just chiming in a bit late, but I'd love to see people playing around with these ideas :-)
I agree this would make the whole thing a lot simpler. |
I personally love what @craig-willis is doing with If we implement this using the extension mechanism instead, the workflow admins would follow is:
Since this will add a buildpack, it can detect it should use the pangeo buildpack, and do whatever it needs to do - even if all it does is change the base image. But if you want a common conda install, you probably aren't going to just change the base image, since that means we'll re-install everything! You would probably set up something a lot more custom... I am going to spend a couple hours today trying to prototype this with the PANGEO stacks images, and report back. |
Alright, I've a fully working prototype based on the pangeo stack! There's a functional README in https://github.com/yuvipanda/repo2docker-pangeo. Try it out and let me know what you think. It currently requires a repo2docker_config.file that's 2 lines long, but we can probably build a discovery mechanism that removes the need for that. The entire code for implementing this is 77 lines long as well. This is one approach to having specialized plugins - for PANGEO, Rocker, etc. I like this because it gives the power to maintain the plugin directly to the people who are maintaining the specific base images. It also gives them the responsibility, thus reducing burden on core repo2docker itself - both from a maintainer and code complexity perspective. This keeps the power of which base images can be used (without a Dockerfile) with the people who are running repo2docker. I'm experimenting with a different approach that gives that power to the people who are making the repositories, using ONBUILD. I'll play with it a bit more and put up a prototype. |
A comment from a discussion on discourse: Peter pointed us to https://buildpacks.io/ I think repo2docker already has a lot of the ideas that are in One thing I like about the We loose composability or at least it needs careful thinking when creating a new build pack if you can still be composed with others or not. This gives rise to the idea of "stacks of buildpacks". TL;DR Right now I am in favour of "build packs choose their base image", "build packs decide which stack they are in" and "use entrypoints to allow external packages to contribute buildpacks". Question (after a quick browse of your code @yuvipanda): My impression is that you implement what I wrote in my TL;DR except for using entrypoints. Instead you insert yourself at the top of the build pack search path via some config magic. |
Doesn't your prototype already let creators of repos choose the base image via what they write in n |
Oh absolutely, this isn't a new idea at all. I think a bunch of us also talked about it in a team meeting a few months ago when @craig-willis was there. Just new code. Entrypoints is the next step, but this already works with released repo2docker so makes for a nice demo. <3 to everyone in this thread for hashing out and moving towards a good set of solutions to a very complex problem! |
Nope it does not. It constrains them to only choosing from PANGEO images. This lets the buildpack make assumptions, such as:
|
+1. We need to figure out a way to deal with ordering when inserted via entrypoints, but that's doable. |
@betatim I <3 buildpacks.io. A lot of it is straight from s2i, which was what the very first versions of repo2docker were based off of. I wrote https://github.com/yuvipanda/words/blob/master/content/post/why-not-s2i.md at that time when we switched away. TLDR is composability. |
https://github.com/yuvipanda/pangeo-stack-onbuild is the other prototype, where stack authors make -onbuild variants of their images. This lets users directly specify which (supported) image they wanna use, and empowers stack authors to support whatever files they wanna support. This works today on mybinder.org, once I wait for my push of this onbuild image to complete... |
https://mybinder.org/v2/gh/yuvipanda/pangeo-stack-onbuild/master works! It is based off the base-notebook image from PANGEO stack, but lets users customize it simply with an environment.yml file in the repo directory. It also works with all binders right now, without any customization needed on the part of the operators. |
My comment wasn't in the spirit of "how lame, this ain't a new idea", I wanted to double check that I hadn't missed anything and that my impression was correct.
Ah yes. I don't think this is a drawback, more a feature because you said: buildpacks make assumptions about the base image.
Hopefully we can find a simple way for this and construct extra buildpacks so that they play nice with each other (keep the triggers separate so that order doesn't matter so much) and definitely play nice with the base buildpacks. Seems like something we should write down as part of the "entrypoints contract". Something like "you should follow these guidelines and if you don't we can't offer any support to you or users of your buildpack". A bit like we do with
Nods. I was interested to see that https://buildpacks.io/docs/using-pack/building-app/ (scroll down to the picture) makes me think that |
Hey there, I just digged up this issue while searching for a solution for our very specific problem:
I think there were some really good options in this thread, which could solve this issue. Can you help me out there? :) |
I think this is a contentious issue with many similar but slightly different use-cases. People want to change the base image but for different reasons/to achieve different goals. I think we need to divide and conquer to make any progress. For your particular problem I'd suggest the following (and I think I'd be happy to merge a PR implementing it but others might have other opinions). A side comment which might solve your problem or not: there is the option to add an "appendix" to every repo being built: https://repo2docker.readthedocs.io/en/latest/usage.html#cmdoption-jupyter-repo2docker-appendix BinderHub can also specify one. Maybe this is enough already? As the name suggests it is an appendix, not a prependix so you can only do stuff that fits with being done at the end. Adding a CLI flag that sets the name of the base image might solve your particular problem. You could make your own base image (somehow), publish it and then build all repo2docker images on top of that. I'm thinking of something that gets literally pasted into https://github.com/jupyter/repo2docker/blob/023e577eee68d5567ddf783a56ac32d44fd5b64c/repo2docker/buildpacks/base.py#L17. This would give the "owner" of the repo2docker process full control and responsibility to provide a base image that will work. The fact that not every base image will work with repo2docker is (for me) the main blocker to making this functionality wide spread. What do you think? (if we want to discuss in more detail maybe we should make a new thread for this specific idea) |
@betatim @yuvipanda this is really interesting since it could allow users to build images in layers (which of course has other benefits), where they could build one image from another image. Our use-case is to replicate the build mechanism within the With this in mind, we could run:
For example, with #909 updating the base image to Another benefit is that if one were to select a more specific image from buildpack-deps, for example, then the user could remove some/all packages with the If there is something we could do to help with this effort let us know (poc, draft wip, etc)! |
@betatim Please do correct me if I'm wrong, but, although being able to toy around w/ system packages in a running environment seems like a fairly standard request, it seems no other approach covers this case, except for using explicit Dockerfile, which I can't use here for other reasons. |
Would a new config file like |
Yes, it would; |
Note: I'm aware this would imply whole bunch of bad usage practices like e.g. discussed in #192 |
That's true, but at least all the bad practices are confined to one file so it's easy to know where to find them when dealing with a broken image, and we could say |
Just bookkeeping another case for |
One important use case: with dockerhub pull limits, it'd be very useful to redirect the build image to a registry cache. We can support this by just setting the build image via ARG. If that's too controversial, would there be any issues with supporting setting a prefix for the build image? |
From 20b0815 :
|
RPM based images are not supported. I would probably say that only Ubuntu versions are supported. And in general if it breaks, you kinda get to keep the pieces. |
I feel like we've discussed this a few times but I can't find a specific issue, so:
What if we exposed the ability for CLI users of repo2docker to specify a base image to use instead of the ubuntu base image. I could see this being useful for:
@craig-willis if you have thoughts on the above that'd be helpful!
I think the trick here is that we'd need to lay down some clear rules for what would need to be in the image. It'd have to run the same Ubuntu version, and would need to have jupyterhub / server stuff ready to go. Perhaps it could be treated as an "advanced use case, you should know what you're doing" kinda thing.
The text was updated successfully, but these errors were encountered: