-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Where do containers fit in? #1
Comments
Response 1: @cosmicexplorerThank you so much for this enduring and thoughtful brainstorm!! First thoughts: Clarifications
To be perfectly clear, is this referring to compat checking by running
Thank you for clearly separating this! Package
|
Re:
I think this is is a REALLY REALLY INTERESTING THOUGHT!!! One amendment I might make to this is just to note that the concepts of "keeping all of your dependencies" vs "keep some/none of your dependencies" apply equally to spack binary packages as well as containers. Some things that would actually justify further investment earlier in spack/spack#20359 or spack/spack#20407 would be:
I think one really great way to make use of containers here is for me to basically forget 100% about those tickets is to be able to both demonstrate either of these work more quickly on a container approach, or to demonstrate that the container approach takes a ridiculous amount less work to maintain and develop. Especially if we can dig in (sometime this week? :D) to differences between containers vs environments -- it wouldn't be a failure to find several. |
Just coming back to this specific framing again:
I am pretty sure it's correct that this seems significantly more maintainable and immediately useful to users than the alternatives, while requiring fewer prerequisites on the host to actually execute successfully. And separately, I think those two issues spack/spack#20359 and spack/spack#20407 can now be considered as validation of your idea of a "container registry" which I see as containing basically a merkle tree CAS for composeable filesystem objects (as opposed to FUSE, which just gives you a directory). |
First replying directly to @vsoch's original post:
Yes! I think there are two pieces here:
The hope is that the model document will give us a model and associated semantics we need to understand this. So to some extent, I think container analysis like should falls out of the model -- containers are just collections of binaries in a system image. The hope is to define things more generally than that.
This one I'm not as sure about. It's an interesting idea, but my question is why? We have binary packages in Spack, which contain the installation of a single package, without other dependencies included. So:
This one's interesting as a potential way to isolate dependencies of programs on the command line, and it is more isolation than you get from That said, the model is supposed to be lower level than this. We're really getting into the mechanisms used to resolve and find dependencies here (dependency resolution/concretization, compile time search paths, runtime search paths, hard-coded paths, etc.) which are all used in one way or another in a container, but the container's a higher-level concept.
We've had people ask for this -- they want bundled HPC apps in containers. I think it's a neat idea but higher level than what we'll initially be looking at here. We do currently have a public binary cache here: https://oaciss.uoregon.edu/e4s/inventory.html. I think the initial problem to solve will be how to put binary packages together (inside or outside of a container) to match the packages on the host.
I can't think of a place where we need this yet but this is pretty similar to just installing a binary package in the container. I think we should look at which makes more sense. |
Ok responding to @cosmicexplorer: High-level: one of the goals of the model is going to be to clarify discussions like this, as there is clearly some variation in terminology. Hopefully writing this up will solidify some of the parts that are maybe confusing/not fleshed out so far. Packages/containers
I mentioned above that we want to be able to resolve using existing, built dependencies. We are already working on this; there is:
That index is a bunch of already-concretized specs, which we can download and use to concretize more specs. Effectively those would be inputs to the solver. The format there is pretty much the same as the Spack database file ( We will be using the indices for Spack installations and binary mirrors to tell the solver what is available, and to maximize reuse of existing binaries from those places. The goal of BUILD is to make it possible to use system packages in a similar way, by getting enough provenance for them that we can use them as inputs to a solver. So think of the binary analysis part of the project as an attempt to translate a typical Linux image into some kind of description like these indices.
I'm confused by this because we already install dependencies of binary packages recursively. What we cannot do is easily swap them -- right now you have to install the exact, concrete dependencies of a binary package when you install the package. What we want is to pick and choose -- e.g., if we want to use system On depending on environments: I don't know what this means, and I think it makes the dependency model less precise. Environments in Spack are really just a group of packages. They're "realized" in the filesystem by a "view", which in the simple case is just a unified prefix (though it can have projections that enable you to map installed packages into different layouts). "Depending on an environment" is likely to be vague -- because you're not saying which packages in the environment you want. I think we should keep dependencies at the package level, and (potentially) you could satisfy them with environments whose component packages meet the requirements. |
Hermetic Process Executions via Containers
So this is tempting and this is what a lot of systems do, but keep in mind that the goal with BUILD is to build a provenance model that can represent any build. A container build is one kind of build, specifically for the OS in the container. In HPC, we can't assume that containers are a requirement. We need something more fundamental so that we reason about both containerized builds and bare metal builds in the same way. |
I don't think I fully understand the title here, but this is not what Specs are. They're not expressible in terms of environments; environments are expressible in terms of specs.
I don't understand how this follows from or is motivated by the other concerns on this thread, so I think you've lost me here.
We will not be working on
Ok, we're kind of getting somewhere w.r.t the model here. But we are not building a model of installation. We are trying to describe their dependency relationships without getting into the mechanism of installation or, really, any kind of ordered, imperative operations. The installation process, or at least its implementation, is out of scope for the model document. The constraints imposed on builds by types of dependencies are not. I realize that is probably a bit confusing at first glance, but think of this as specifying the "what" and not the "how". Any kind of installation process is going to be the "how". Containers are also probably in the category of "how" w.r.t. isolation, installation structure, deployment model, etc. More on @vsoch's original thread:
Reproducibility in Spack comes from the metadata hashes -- essentially, the build configuration, which is then run through our (templated)
We really do not want to tie any Spack description to a specific container artifact that can change over time. Ideally, we'd have a representation (concrete specs) that can tell us how the old version of the container and the new version of the container are different. We want a package-level description, NOT an image. Rest of the post aboveI really don't think much of the rest of @cosmicexplorer's post above is in scope for the model document. In particular, we're not talking about virtualization, deduplication, filesystems, or sandboxing as part of this work. It may be that certain types of builds require these things, but these are the how, not the what. We want the "what". |
RE:
We need to keep in mind here that HPC users still by and large build on the host. Building externally is in many cases not an option -- e.g., if the host hardware is not available in the build farm. This happens a lot in HPC. Again, though, this stuff is "how" not "what". |
This is neat! A random idea - is this API custom for spack? I'm wondering if there is some similarity between this cache idea and other standards for registries (e.g., the distribution spec). We would want it to be easy for other package managers to follow suit, meaning creating a standard for the structures and different API calls possible to this cache (and then the registry can customize the user facing interfaces however they need). For the container points, I think it's unlikely that you'd find a container in the wild built with spack. If it's a read only container, you would be unlikely to be able to re-build without the original recipe, so at best we would just assess compatability with the host.If it really does need something installed inside, I think we'd have to extract as much as we can about what's in the container and build it again (given read only).
I've been thinking about this recently - at least for libraries that have dependencies specified by the creator (e.g., Python) we are placing a lot of emphasis on those reqiurements. But I'm not convinced they are always 1) complete, 2) correct, or 3) have the proper flexibility. This is a huge human element in this process that can lead to the solver failing (e.g., pip) when the constraints are too strict.
This make sense, and I agree. Containers are not special, just one level of abstraction using the same ideas.
I was thinking for use cases that are hard if not impossible to install, such as Windows apps with wine, or some use case where the host OS absolutely won't work. Does spack even work with Windows? For relocation, it would have to be a rebuild of the container (which we could call an "install").
A container really could just be considered another type of install, so nothing special would need to be said in this model.
Gotcha.
I think this could be helped by figuring out how this would work in practice - e.g., if spack were to hand of some set of information to this API, what would that endpoint look like in terms of metadata it needs, and what would the response look like. It's sort of similar to asking for a container URI, and ultimately getting a manifest and then layers, except here we would be ultimately getting some list of binaries. I'm not familiar with this cache at all, so this is just introspection. |
YES. One of the major goals of this document is to get a written description of what needs to go into such a spec. Spack is, at least AFAIK, kind of a superset of what other package managers provide. If we can spec out the format (something you're kind of already working on) I think we could make a really general metadata model. The difference here, with this project, is that we're also trying to spec out ways to reason about such a model. |
Agree. Though I think I was vague. Spack can't actually build a whole container; it generates a recipe, so yeah you'd build with something else, and you'd have to start with a base image as we do not have packages for everything. There are people building apps in containers with Spack (which is really what I mean here). I do think that the major use case here is adapting the container to the host and not using containers as a way to reuse packages in other ecosystems. I think the container's the final artifact - the thing you make with a package manager or maybe a build. Binaries from container images are, I think, not very well suited to plugging into other ecosystems (which is kind of the point of this exercise -- bridging the container/host interface) |
Yes! Though that doesn't mean we shouldn't have a way to express the correct requirements. The goal here is to come up with a model that can be specified by humans or by machines -- and to get us to a place where the latter happens more often than the former. |
Spack doesn't work on windows yet but we have a contract with Kitware and TechX to work on that -- it's happening now and the model is different (PE is, sadly, not ELF).
Yep. I think rebuilding is one way to do it. Though TBH you could probably relocate |
Yes!
Yes! We should talk about this pretty soon after you start. There is also the question of how to query an ecosystem of packages like this, how to put stuff like that json file in a database and expose the right query semantics, etc. This is definitely the direction we want to go in to interface bt/w the analysis (taking raw binaries and coming up with ABI specs) and the solvers (which would query these things). e.g., suppose you want to solve for a container that works with the host MPI and CUDA installation on some HPC system. The solver could query a binary cache and ask for packages that a) are possible dependencies of the application to be built, and b) some bounds that we know a priori, like "only for these OS's and targets". |
I'm reading over the overview draft (really exciting!) and I'd like to brainstorm how containers fit into this model. Right now we have them as a part of composition:
Which I think means that we would be able to build packages into containers with
spack containerize
and then check compatability with libraries on the host (MPI for Singularity comes to mind as a good example).So to step back - there are two scenarios I can see containers:
spack containerize
). We are following the same build routines but inside of a container with spack. For running this container, we'd need to be checking against the host for ABI compatability, per what is written into the current spec.For the second point, I'm wondering if this could be a use case for spack (or this general package manager model), period. If we imagine that a user wants to use spack as a container registry, instead of compiling / building on their host, would this be hard or unreasonable to do? A "build" really comes down to ensuring the container technology is installed, and there is a means to pull based on a specific hash or tag, and then have containers as executables on the path (Singularity) or run them (Docker, less likely for HPC, but podman and friends are just around the corner). We can focus on the Singularity use case to start, since the container is akin to an executable binary. This would mean that the user is allowed to install any container URI available, and the containers in storage would need to be namespaced depending on their URI. Reproducibility then would not depend on what spack does, but on if the version of the container changes. We would then need some way to still check the container for ABI compatability with the host (again focusing on Singularity). In the same way we could export a tree of packages and dependencies, we could also export a list of containers.
For point 3, this is similar to the idea of having isolated environments for other needs too (I remember the discussion about pex, for example). It would allow the user to have a combination of natively built packages and containers "for all those other use cases where I want to keep things separate."
And another idea, maybe this is a point 4. If we are bind mounting containers to potentially eventually link to a library inside, you could imagine having containers that exist only to serve as bind resources for some set of libraries that might require very hard to satisfy host dependencies.
The text was updated successfully, but these errors were encountered: