tracker: logically bound app images #128

cgwalters · 2023-09-22T14:52:44Z

Logically bound images

Current documentation: https://containers.github.io/bootc/experimental-logically-bound-images.html

Original feature proposal text:

We should support a mechanism where some container images are "lifecycle bound" to the base bootc image.

A common advantage/disadvantage of the below is that the user must manage multiple container images for system installs - e.g. for a disconnected/offline install they must all be mirrored, not just one.

In this model, the app images would only be referenced from the base image as .image files or an equivalent.

This contrasts with physically bound images.

bootc logically bound flow

bootc upgrade follows a flow like:

fetch new base image
Read its root filesystem, discover logically bound images
Pull those images into a new bootc-owned container storage (xref Introduce bootc-owned container store, use for bound images #724 )
Garbage collect images which are not referenced by any root (e.g. pending/current/rollback).

Current design: symlink to `.image` or `.container` files

Introduce /usr/lib/bootc/bound-images.d that is symlinks to .image files or .container files.

Pros:

Straightforward to implement
The admin only needs to bump a :sha256 digest in one place to update

Cons:

Handling systemd specifiers is tricky, we will error out on them
No separation of concerns: an .image file is intended to pull images not be parsed by an external tool for a separate purpose.
Updates to Quadlet may break the process and/or add (potential) continuous maintenance burden for bootc (i.e., "chasing/reimplementing new features").
Forces users to use Quadlet even if they have no use for pulling images under systemd.

Note: we expect the .image files to reference images by digest or immutable tag. There is no mechanism to pull images out of band.

Other alternatives considered

New custom config file

A new TOML/usr/lib/bootc/bound-images.d, of the form e.g. 01-myimages.toml:

images = ["quay.io/testos/someimage@sha256:12345", "quay.io/testos/otherimage@sha256:54321"]
authfile = "/etc/containers/my-custom-auth.json"

Pros:

Easy to just list multiple images vs one image per .image file
TOML format is used by other bootc tooling and some of the container config formats

Cons:

New file format relating to container images
May need in the general case to support many of the existing options in .image files
The admin will need to bump a :sha256 digest in two places to update in general (both in a .container or .image and the custom .toml here)

Parse existing `.image` files

Pros:

Well known
Spec -> pull translation exists
Existing spec handles most image pull fields

Cons:

Would need to extend the spec to include a new bootc=bound or equivalent opt-in
Handling specifiers is tricky
Implentation complicated wrt managing systemd

What would happen under the covers here is that bootc would hook into podman and:

disallow GC of these images even if the unit isn't running (for all deployments)
Fetch new images (from the new base container image) on bootc upgrade

TODO:

docs
CI test
PR to fedora bootc examples
Ensure compatibility with bootc-image-builder bound images fail in bootc-image-builder #715
install path with bootc install to-filesystem - simple scenario w/out pull secret?
install path w/pull secret embedded in bootc image? podman pull happens from bootc container
install path w/bootc-image-builder where it pre-pulls images, demonstrated e2e w/konflux, we probably need to enable a model where bound images are provided in a mirror location or OCI directory

The text was updated successfully, but these errors were encountered:

alexlarsson · 2023-10-24T18:13:44Z

In the automotive world we often think of containers as two possible things. Either they come with the system, and are updated atomically with it, or they are separately installed. They way we expect this to work is for the system ones to be installed in a separate image store that is part of the ostree image. And then the "regular" containers will just be stored in /var/lib/container.

The automotive sig manifests ship a storage.conf that has:

[storage.options]
additionalimagestores = [   /usr/share/containers/storage ]

Then we install containers in the image with osbuild like:

      - type: org.osbuild.skopeo
        inputs:
          images:
            type: org.osbuild.containers
            origin: org.osbuild.source
            mpp-resolve-images:
              images:
                - source: registry.gitlab.com/centos/automotive/sample-images/demo/auto-apps
                  tag: latest
                  name: localhost/auto-apps
        options:
          destination:
            type: containers-storage
            storage-path: /usr/share/containers/storage

alexlarsson · 2023-10-24T18:15:38Z

This was part of the driver for the need for composefs to be able to contain overlayfs base dirs (overlay nesting). Although that is less important if container/storage also uses composefs.

rhatdan · 2023-10-30T18:58:44Z

I love the idea of additonal stores for this.

vrothberg · 2023-10-30T19:04:26Z

Quadlet supports .image files now which can be directly referenced in .container files. Maybe that's a way to achieve a similar effect.

The .image files don't yet (easily) allow for pulling into an additional store, but this could be a useful feature.

Cc: @ygalblum

cgwalters · 2023-10-30T19:44:12Z

Then we install containers in the image with osbuild like:

So IMO this issue is exactly about having bootc install and bootc update handle these images. Because as is today, needing to duplicate the app images in an osbuild manifest is...unfortunate. With this proposal, when osbuild is making a disk image, it'd use bootc install internally to the pipeline, and we wouldn't need to re-specify the child container images out of band of the "source of truth" of the parent image.

alexlarsson · 2023-10-31T09:27:31Z

Then we install containers in the image with osbuild like:

So IMO this issue is exactly about having bootc install and bootc update handle these images. Because as is today, needing to duplicate the app images in an osbuild manifest is...unfortunate. With this proposal, when osbuild is making a disk image, it'd use bootc install internally to the pipeline, and we wouldn't need to re-specify the child container images out of band of the "source of truth" of the parent image.

I understand that, and I merely pointed out how we currently do it in automotive, not how it would be done with bootc.

Instead, what I propose is essentially:

Dockerfile:

FROM bootc-base
RUN podman --root /usr/lib/containers/my-app pull quay.io/my/app
ADD my-app.container /etc/containers/systemd

my-app.container:

[Container]
Image=quay.io/my/app
PodManArgs=--storage-opt=overlay.additionalimagestore=/usr/lib/containers/my-app

And then you have an osbuild manifest that just deploys the above image like any normal image.

Of course, instead of open-coding the commands like this, a tool could do the right thing automatically.

You might also want the tool to tweak the image name in the quadlet to contain the actual digest so we know that the exact right image version is used every time.

alexlarsson · 2023-10-31T09:39:06Z

Its also interesting to reflect on the composefs efficiency in a setup like this.

If we use composefs for the final ostree image, we will get perfect content sharing, even if each of the individual additional-image-stores use its own composefs objects dir. Even if no effort is made to try to share object files between image store directories. Because all the files will eventually be deduplicated as part of the full ostree composefs image.

In fact, we will even deduplicate files between image stores that use the traditional overlayfs or vfs container store formats.

alexlarsson · 2023-10-31T09:41:32Z

In fact, maybe using vfs backend is the right approach here? It is a highly stable on-disk format, and its going to be very efficient to start such a container. And we can ignore all the storage inefficiencies, because they are taken care off by the outer composefs image.

ygalblum · 2023-11-01T10:32:51Z

my-app.container:

[Container]
Image=quay.io/my/app
PodManArgs=--storage-opt=overlay.additionalimagestore=/usr/lib/containers/my-app

Just wanted to note that --storage-opt is a global argument. So, the key to use is GlobalArgs instead of PodmanArgs.

alexlarsson · 2023-11-22T11:51:45Z

I wonder if we should tweak the base images to have a standardized /usr location for additional image store images.

rhatdan · 2023-11-22T12:44:06Z

/usr/lib/containers/storage?

alexlarsson · 2023-11-22T15:25:14Z

@rhatdan Yeah, that sounds good to me. Can we perhaps just add it alwas to our /usr/share/containers/storage.conf file?

rhatdan · 2023-11-22T16:21:29Z

You want that in the default storage.conf in containers/storage?

rhatdan · 2023-11-22T16:25:59Z

If you setup an empty additionalstore you need to precreate the directories and lock files. This is what we are doing to setup an empty AdditonalStore. We should fix this in containers/storage to create these files and directories if they do not exists.

RUN mkdir -p /var/lib/shared/overlay-images \
             /var/lib/shared/overlay-layers \
             /var/lib/shared/vfs-images \
             /var/lib/shared/vfs-layers && \
    touch /var/lib/shared/overlay-images/images.lock && \
    touch /var/lib/shared/overlay-layers/layers.lock && \
    touch /var/lib/shared/vfs-images/images.lock && \
    touch /var/lib/shared/vfs-layers/layers.lock

alexlarsson · 2023-11-23T08:50:02Z

@rhatdan Would it maybe be possible instead to have containers/storage fail gracefully when the directory doesn't exist?

rhatdan · 2023-11-24T11:55:05Z

Yes that is the way it should work. If I have time I will look at it. Basically ignore the storage if it is empty.

rhatdan · 2023-11-24T11:59:27Z

Actually I just tried it out, as long as the additional image store directory exists, the store seems to work. No need for those additonal files and directories.

rhatdan · 2023-11-24T12:07:33Z

cat /etc/containers/storage.conf
[storage]
driver = "overlay"
runroot = "/run/containers/storage"
graphroot = "/var/lib/containers/storage"
[storage.options]
pull_options={enable_partial_images = "true", use_hard_links = "false", ostree_repos=""}
additionalimagestores = [
"/usr/lib/containers/storage",
]

Additional store directory is empty

 ls -l /usr/lib/containers/storage/
total 0

podman info 
...

So podman will write to the empty directory and create

# ls -lR /usr/lib/containers/storage/
/usr/lib/containers/storage/:
total 4
drwx------. 2 root root 4096 Nov 24 07:03 overlay-images

/usr/lib/containers/storage/overlay-images:
total 0
-rw-r--r--. 1 root root 0 Nov 24 07:03 images.lock

So podman will write to the empty directory and create the missing content.

If the file system is read-only it fails.

podman info
Error: creating lock file directory: mkdir /usr/lib/containers/storage/overlay-images: read-only file system

alexlarsson · 2023-11-27T10:04:09Z

So, I've been thinking about the details around this for a while. In particular about the best storage for these additional image directories. The natural approach would be to use the overlay backend, as we can then use overlay mounts for the actual container, but this has some issues.

First of all, historically, ostree doesn't support whiteout files. This has been recently fixed, although even that fix requires adding custom options to ostree. In addition, if ostree is using composefs, there are some issues with encoding both the whiteouts as well as the overlayfs xattrs in the image. These are solved by the overlay xattr escape support I have added in the most recent kernel, although we don't yet have that backported into the CS9 kernel.

However, I wonder if using overlay directories for the additional image dir is even the right approach? All the files in the additional image dir will anyway be deduplicated by ostree, so maybe it would be better if we used an approach more like the vfs backend, where each layers is completely squashed (and then we rely on the wrapping ostree to de-duplicate these). Such a layer would be faster to setup and use (since it is shallower), and fix all the issues regarding whiteouts and overlay xattrs.

I see two approaches for this:

Use overlay backend with composefs format. This moves all the xattrs and whiteouts into the composefs image file, which will work fine in any ostree image
Teach the overlay container/storage backend the ability to squash individual layers, and then do this for all the images in the additional image store.

Opinions?

cgwalters · 2024-07-10T22:33:34Z

One other thing to ponder here is related to #518

Basically if you look at this from a spec/status perspective; we effectively have a clear spec that is readable by external tooling: the "symlink farm". It's not reflected in bootc status - should it? I'm not sure.

What we don't directly have is status; while I think we'll end up doing the podman create in order to pin, that still allows things like podman system reset or just a plain podman rm -f. Should bootc also expose a status for this in our status fields? I think so.

Perhaps the status is just a boolean pinnedImagesValid: true|false (or maybe we go all the way to a condition).

I also wonder if we may need an explicit verb to re-synchronize in the case of a podman system reset? Or maybe just typing bootc upgrade again should do that.

cgwalters · 2024-07-15T17:54:55Z

@vrothberg can we dig in a bit into the high level design here of whether this should use additionalstores or not?

In the current code, it doesn't. I see pros and cons to both approaches.

One way to think about this is I see a continuum between "floating" "logically bound" and "physically bound".

With "physically bound" is that the images are officially read-only, podman rm isn't going to ever work, etc.

However...IMO, for logically bound images I can see people also wanting to do dynamic updates to them apart from a bootc upgrade. Yes, you might also queue an update to them on the base bootc image side, but for a number of workloads it could make total sense to do it dynamically outside of the host.

Take e.g. an OpenShift control plane node with etcd. We want etcd always there by default - but it's also totally sane and valid to rev etcd for a hotfix apart from updating the host.

The images being in the "default mutable /var/lib/containers" storage makes the use case of dynamic updates work pretty seamlessly I believe, whereas with a separate additional store I think introduces some confusion/friction there.

The choice of an additional store for logically bound is pretty consequential though and so I think it makes sense to try to figure out now.

Tangential, but if we do choose to use an additional store, I think we should put it under /sysroot/ostree or so, making it visible to the host podman as /usr/share/bootc/storage or something? It's basically where the host bootc storage is, and it's basically what #215 is doing.

rhatdan · 2024-07-15T18:12:18Z

If you do an update on an image in the primary store, it will use the tag in the primary store.

Example, If I had alpine:latest in additional store and did a podman pull alpine and downloaded a different image into the primary store, then podman images and all tools would use the alpine:latest in the primary store.

This could be an issue if later we pulled an image into the bootc image additional store that is newer then the alpine in the primary store.

Bottom line for now we could just indicate in the .Image and .Container files to use an addional store to protect the images. But this would force the quadlets to always use the images in the additionalstore.

The big advantage of the additional store, is we have it now, and do not need to wait for some future podman release.

cgwalters · 2024-07-15T18:30:34Z

@rhatdan It's a bit unclear to me, are you arguing for or against using an additionalstore by default for logically bound images? (And does the answer depend on "short term" vs "medium term"?)

rhatdan · 2024-07-15T18:34:59Z

I am giving point/counter point. I don't think we necessarily want bootc to force an additional store, but we might want to take advantage of one in RHEL AI. A lot of this is talking out loud.

But I think we could just use standard stores and tell users "don't do that" if they attempt to do a podman image prune, bootc or starting a quadlet would pull the image.

vrothberg · 2024-07-16T07:49:10Z

As Dan mentioned, if you have image A in an additional store and force-pull it a newer one, it will be pulled into the primary store. The primary will always take precedence over additional stores when looking up local images.

This could be an issue if later we pulled an image into the bootc image additional store that is newer then the alpine in the primary store.

I think we can always construct a situation where the user may do something they shouldn't do. We cannot protect against that.

I think additional stores are the way to go as they were designed with this use case (read-only images) in mind.

For Quadlets in general I see benefits of using additional stores as it's one more protection from the user accidentally removing an image.

cgwalters · 2024-07-16T16:45:29Z

OK. I'm increasingly convinced, however the basic mechanics of wiring this up are going to be somewhat nontrivial.

cgwalters · 2024-07-16T18:58:56Z

🆕 #659 landed with a very MVP functionality; however I think we should have basic docs and tests next.

Beyond "absolute MVP" functionality here is things like:

pinning the image (podman create) or using an additional store per above discussion

rhatdan · 2024-07-16T19:01:55Z

Can you also change the usr/lib/bootc-experimental/bound-images.d directory name to just /usr/lib/bootc/nound-images.d?

cgwalters · 2024-07-16T19:12:51Z

Can you also change the usr/lib/bootc-experimental/bound-images.d directory name to just /usr/lib/bootc/nound-images.d?

IOW you want the image to be not experimental and maintained ~forever?

rhatdan · 2024-07-16T19:26:41Z

I want the concept to be managed forever.

If RHEL AI uses it, We need it for the next X years.

If you change the format of the files, I don't care. but starting out with saying something is experimental in the RHEL world should be a non-starter.

cgwalters · 2024-07-16T22:00:46Z

We're already shipping things classified as experimental (xref #690 ); I think it's an essential way to get feedback without committing to an interface immediately.

As far as stability, in theory we could allow usage of an experimental interface, but we just need to keep it around as long as the known consumers use it.

That all said, OK...the feature as is today is sufficiently small that perhaps it can just be stable to start for the next release.

rhatdan · 2024-07-17T11:56:23Z

I don't care if you document something as experimental, but putting it into the file system, makes it difficult to transition, when it is no longer experiemental. I just want the directory renamed.

cgwalters · 2024-07-23T19:40:27Z

I don't care if you document something as experimental, but putting it into the file system, makes it difficult to transition, when it is no longer experiemental. I just want the directory renamed.

This was changed in #714

cgwalters · 2024-07-23T19:40:59Z

OK, the more I play with this the more I am coming to the conclusion it makes sense to put bound images in the "bootc storage". Which...doesn't yet exist, but should. I will write up a separate issue.

EDIT: done in #721

cgwalters · 2024-07-25T20:39:23Z

So...an interesting semantic with logically bound images as they exist today (writing to the default shared /var/lib/containers) is that by default if you're tracking a floating tag (e.g. :latest) for the image, each time bootc upgrade is invoked we will only re-fetch the tag when the base image changes.

But...when that does happen, the updated bound image is immediately visible.

Some people will want to invoke e.g. bootc upgrade without rebooting just as a way to "pre-fetch" an upgrade. In that scenario, if you have logically bound images, if any container referencing them happens to restart, it will suddenly see the new image.

It would hence feel more predictable to me if we made logically bound images default to only appearing in their referenced root. It is more likely that we can implement that on top of #721 but it's still quite nontrivial.

OTOH...as I said in some other place I can actually see it being quite useful for users to pre-update logically bound images (can I acronym as LBI? just here?) ok yes LBIs outside of the default host update lifecycle.

But if we go that path...it seems certainly far cleaner to offer an explicit bootc image upgrade or so that explicitly doesn't touch the host, instead of just offering bootc upgrade which would do it via side effect?

Or...of course alternatively, bootc upgrade could check for all LBI updates (for versioned tags) even if the host image didn't change. That would feel more consistent too.

cgwalters · 2024-07-25T20:40:58Z

@ckyrouac opinions on ⬆️ ?

cgwalters · 2024-07-25T20:53:48Z

Maybe we just for now strongly discourage floating tags for LBIs, and document the semantic that they will only update when the host changes.

cgwalters · 2024-07-25T21:18:29Z

Something I also am realizing related to this is that bootc upgrade --check won't give you any info for LBIs at all; i.e. we lose the ability to know how much data we'll download in advance at all. The users who choose to use this may not care...at first. But...this may argue for creating a build process that can ensure that the bootc base image's manifest can reference the LBIs at a metadata level.

There's a lot of advantages to that, but it would be Hard to do in a Containerfile flow today without going all the way to something like FROM oci-archive.

ckyrouac · 2024-07-25T21:25:01Z

It would hence feel more predictable to me if we made logically bound images default to only appearing in their referenced root. It is more likely that we can implement that on top of #721 but it's still quite nontrivial.

I think this makes the most sense. I haven't had a chance to look closely at your draft PR to use an additional store for bound images, but this is how I expect it to work. e.g. when upgrading a bootc system that has a new bound-image, we would pull the bound-image into the staged root's storage. This seems to make the most sense if the additional store will be in /usr which is not supposed to change. Since we'll no longer be using the shared storage, I think we'll need to first check the booted root for the image and copy it to the staged root to avoid re-downloading it, or something to avoid re-downloading the image every upgrade.

That doesn't address the issue of how to handle floating tags though. I'm not really sure how we can make binding to :latest be predictable since it will depend on when an upgrade happens, or what was on the build system when a disk is created, etc. I think the best we could do is be clear in our docs how floating tags will behave, make bootc status clearly show which version is booted/staged, maybe some lint checks to discourage using :latest (although there could be other floating tag names). I need to think about this more.

cgwalters · 2024-08-02T14:53:46Z

https://gitlab.com/fedora/bootc/examples/-/merge_requests/50

cgwalters added the enhancement New feature or request label Sep 22, 2023

cgwalters mentioned this issue Sep 25, 2023

Pin and pre-load images openshift/enhancements#1481

Merged

cgwalters mentioned this issue Oct 20, 2023

remote config via configmap and secrets #22

Open

cgwalters mentioned this issue Oct 30, 2023

Add shared library/tool for managing backing store files containers/composefs#125

Open

cgwalters mentioned this issue Nov 9, 2023

add support or verify support of OCI crypt #147

Open

jlebon mentioned this issue Nov 27, 2023

Add container-images to the compose / treefile coreos/rpm-ostree#2675

Open

This was referenced Nov 28, 2023

using bootc install-to-filesystem osbuild/bootc-image-builder#18

Open

[Feature idea]: /etc/flatpak/preinstall.d flatpak/flatpak#5579

Open

cgwalters mentioned this issue Dec 13, 2023

MCO-507: admin defined node disruption policy enhancement openshift/enhancements#1525

Merged

cgwalters mentioned this issue Jan 19, 2024

docs/usage: New file #273

Merged

cgwalters mentioned this issue Feb 9, 2024

Tracker for support for nested containers CentOS/centos-bootc#282

Open

cgwalters mentioned this issue Jul 16, 2024

docs: Add experimental section w/bootc image #691

Merged

cgwalters mentioned this issue Jul 18, 2024

Bound install time #712

Merged

cgwalters mentioned this issue Jul 23, 2024

tracker: bootc container storage #721

Closed

This was referenced Jul 26, 2024

expose /var/lib/containers to bootc install to-filesystem osbuild/bootc-image-builder#560

Closed

Unified storage #20

Open

tracker: Switch to bootc logically bound" images containers/ai-lab-recipes#715

Open

cgwalters added the area/experimental-feature Relating to an experimental feature label Aug 16, 2024

cgwalters mentioned this issue Aug 26, 2024

add support for logically bound images for anaconda ISO osbuild/bootc-image-builder#622

Open

cgwalters mentioned this issue Sep 11, 2024

Support storing Ollama [non-]OCI image layers containers/storage#2075

Draft

tracker: logically bound app images #128

tracker: logically bound app images #128

Comments

cgwalters commented Sep 22, 2023 • edited Loading

Logically bound images

bootc logically bound flow

Current design: symlink to .image or .container files

Other alternatives considered

New custom config file

Parse existing .image files

alexlarsson commented Oct 24, 2023

alexlarsson commented Oct 24, 2023 • edited Loading

rhatdan commented Oct 30, 2023

vrothberg commented Oct 30, 2023 • edited Loading

cgwalters commented Oct 30, 2023

alexlarsson commented Oct 31, 2023 • edited Loading

alexlarsson commented Oct 31, 2023

alexlarsson commented Oct 31, 2023 • edited Loading

ygalblum commented Nov 1, 2023

alexlarsson commented Nov 22, 2023

rhatdan commented Nov 22, 2023

alexlarsson commented Nov 22, 2023

rhatdan commented Nov 22, 2023

rhatdan commented Nov 22, 2023

alexlarsson commented Nov 23, 2023

rhatdan commented Nov 24, 2023

rhatdan commented Nov 24, 2023

rhatdan commented Nov 24, 2023

alexlarsson commented Nov 27, 2023

cgwalters commented Jul 10, 2024

cgwalters commented Jul 15, 2024 • edited Loading

rhatdan commented Jul 15, 2024

cgwalters commented Jul 15, 2024

rhatdan commented Jul 15, 2024

vrothberg commented Jul 16, 2024

cgwalters commented Jul 16, 2024

cgwalters commented Jul 16, 2024

rhatdan commented Jul 16, 2024

cgwalters commented Jul 16, 2024

rhatdan commented Jul 16, 2024

cgwalters commented Jul 16, 2024

rhatdan commented Jul 17, 2024

cgwalters commented Jul 23, 2024

cgwalters commented Jul 23, 2024 • edited Loading

cgwalters commented Jul 25, 2024

cgwalters commented Jul 25, 2024 • edited Loading

cgwalters commented Jul 25, 2024

cgwalters commented Jul 25, 2024

ckyrouac commented Jul 25, 2024

cgwalters commented Aug 2, 2024

cgwalters commented Sep 22, 2023 •

edited

Loading

Current design: symlink to `.image` or `.container` files

Parse existing `.image` files

alexlarsson commented Oct 24, 2023 •

edited

Loading

vrothberg commented Oct 30, 2023 •

edited

Loading

alexlarsson commented Oct 31, 2023 •

edited

Loading

alexlarsson commented Oct 31, 2023 •

edited

Loading

cgwalters commented Jul 15, 2024 •

edited

Loading

cgwalters commented Jul 23, 2024 •

edited

Loading

cgwalters commented Jul 25, 2024 •

edited

Loading