Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

install: Add ensure-completion verb, wire up ostree-deploy → bootc #915

Merged
merged 1 commit into from
Dec 5, 2024

Conversation

cgwalters
Copy link
Collaborator

@cgwalters cgwalters commented Nov 20, 2024

When bootc was created, it started to become a superset of ostree;
in particular things like /usr/lib/bootc/kargs.d and logically
bound images.

However...Anaconda today is still invoking ostree container image deploy.

Main fix

When bootc takes over the /usr/libexec/ostree/ext/ostree-container
entrypoint, make the existing ostree container image deploy CLI actually
just call back into bootc to fix things up. No additional work required other
than getting an updated bootc in the Anaconda ISO.

Old Anaconda ISOs

But, a further problem here is that Anaconda is only updated once
per OS major+minor - e.g. there won't be an update to it for the lifetime
of RHEL 9.5 or Fedora 41. We want the ability to ship new
features and bugfixes in those OSes (especially RHEL9.5).

So given that we have a newer bootc in the target container, we can
do this:

%post --erroronfail
bootc install ensure-completion
%end

And will fix things up. Of course there's fun $details here...the
way Anaconda implements %post is via a hand-augmented chroot
i.e. a degenerate container, and we need to escape that and
fix some things up (such as a missing cgroupfs mount).

Summmary

  • With a newer bootc in the ISO, everything just works
  • For older ISOs, one can add the %post above as a workaround.

Implementation details: Cross-linking bootc and ostree-rs-ext

This whole thing is very confusing because now, the linkage
between bootc and ostree-rs-ext is bidirectional. In the case
of bootc install to-filesystem, we end up calling into ostree-rs-ext,
and we must not recurse back into bootc, because at least for
kernel arguments we might end up applying them twice. We do
this by passing a CLI argument.

The second problem is the crate-level dependency; right now they're
independent crates so we can't have ostree-rs-ext actually
call into bootc directly, as convenient as that would be. So we
end up forking ourselves as a subprocess. But that's not too bad
because we need to carry a subprocess-based entrypoint anyways
for the Anaconda %post case.

Implementation details: /etc/resolv.conf

There's some surprising stuff going on in how Anaconda handles
/etc/resolv.conf in the target root that I got burned by. In
Fedora it's trying to query if systemd-resolved is enabled in
the target or something?

I ended up writing some code to just try to paper over this
to ensure we have networking in the %post where we need
it to fetch LBIs.

Signed-off-by: Colin Walters walters@verbum.org


@cgwalters
Copy link
Collaborator Author

cgwalters commented Nov 20, 2024

This seems to be working well in my hand-rolled testing.
(BTW I am testing via virt-manager/virt-manager#739 (comment) )

Still TODO:

@cgwalters
Copy link
Collaborator Author

OK I've rebased this on top of #860 and we successfully pull LBIs at Anaconda install time too now.

@cgwalters
Copy link
Collaborator Author

There were a surprising number of things I hit. One of them for example is that anaconda's hand-rolled chroot/container doesn't mount cgroupfs which makes podman quite unhappy so we do so manually in https://github.com/containers/bootc/pull/915/files#diff-66bc72c28514e2546fbe456aee74a321866d5a9147136ef99251eec1e08be8ddR107

@cgwalters cgwalters force-pushed the install-fixup branch 2 times, most recently from 21467e0 to 4b111dd Compare November 24, 2024 15:04
@cgwalters cgwalters changed the title install: Add hidden ensure-completion verb install: Add ensure-completion verb, wire up ostree-deploy → bootc Nov 24, 2024
@cgwalters cgwalters marked this pull request as ready for review November 25, 2024 12:59
@cgwalters cgwalters requested a review from omertuc November 25, 2024 14:41
@cgwalters
Copy link
Collaborator Author

The plus side of this PR as is is that it has near-zero risk unless explicitly turned on in the two places right now. The anaconda %post path is obviously opt-in. And the extensions to the ostree container image deploy path also only turn on when bootc takes those over from e.g. rpm-ostree today.

I've given this a fair bit of manual testing, but I think what will help here is to get this into e.g. Fedora rawhide and that'll get things running through the daily integration testing.

@cgwalters
Copy link
Collaborator Author

https://github.com/cgwalters/playground/blob/main/202411/inst.sh and inst.ks are how I was testing this

@omertuc
Copy link
Contributor

omertuc commented Nov 26, 2024

cgwalters/playground@main/202411/inst.sh and inst.ks are how I was testing this

Broken link

@cgwalters
Copy link
Collaborator Author

Broken link

Hmm, does it work for anyone else?

@omertuc
Copy link
Contributor

omertuc commented Nov 26, 2024

I think the repo is just private

@cgwalters cgwalters disabled auto-merge November 26, 2024 16:13
@cgwalters
Copy link
Collaborator Author

cgwalters commented Nov 26, 2024

Duh sorry, here's the raw scripts. But yes ideally these bits are generalized into virt-manager/virt-manager#739

#!/bin/bash
set -xeuo pipefail

#cd=/var/srv/walters/machine-images/centos-stream9/CentOS-Stream-9-20241118.0-x86_64-boot.iso
cd=/var/srv/walters/machine-images/fedora/Fedora-Everything-netinst-x86_64-Rawhide-20241117.n.0.iso
virt-install --connect qemu:///session \
--name anaconda-bootc-test \
--vcpus 4 \
--ram 8192 \
--disk size=40 \
--location ${cd} \
--os-variant rhel9.4 \
--initrd-inject=./inst.ks \
--memorybacking=source.type=memfd,access.mode=shared \
--filesystem=/var/srv/,host-var-srv,driver.type=virtiofs \
--noautoconsole \
--extra-args "inst.ks=file:/inst.ks"
%pre --erroronfail
mkdir -p /mnt/host-var-srv
mount -t virtiofs -o ro host-var-srv /mnt/host-var-srv
# This code actually takes over things
cp -p /mnt/host-var-srv/walters/bootc.fedora /usr/bin/bootc
ln -sfr /usr/bin/bootc /usr/libexec/libostree/ext/ostree-container
%end

# Basic setup
text
network --bootproto=dhcp --device=link --activate
# Basic partitioning
clearpart --all --initlabel --disklabel=gpt
reqpart --add-boot
part / --grow --fstype xfs

ostreecontainer --transport oci --url /mnt/host-var-srv/walters/oci:bootc-lbi

rootpw <password>
sshkey --username root "<key>"
#reboot

%post
curl -L --head https://quay.io
%end

%post --erroronfail
bootc install ensure-completion
%end

@cgwalters
Copy link
Collaborator Author

Rebased 🏄

@cgwalters cgwalters enabled auto-merge December 2, 2024 13:51
@cgwalters
Copy link
Collaborator Author

@omertuc mind reviewing?

@cgwalters
Copy link
Collaborator Author

OK no traction, @vrothberg can you do it?

Copy link
Contributor

@allisonkarlitskaya allisonkarlitskaya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry that I didn't say it explicitly earlier: I don't have enough knowledge to review this. :/

@cgwalters cgwalters requested a review from jeckersb December 3, 2024 18:37
Copy link
Contributor

@omertuc omertuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial superficial review, will post another for completion.rs

lib/src/cli.rs Outdated Show resolved Hide resolved
lib/src/cli.rs Outdated Show resolved Hide resolved
ostree-ext/src/globals.rs Outdated Show resolved Hide resolved
ostree-ext/src/container/deploy.rs Outdated Show resolved Hide resolved
lib/src/boundimage.rs Show resolved Hide resolved
auto-merge was automatically disabled December 3, 2024 20:03

Pull Request is not mergeable

@cgwalters
Copy link
Collaborator Author

Ah, CI fell over due to missing #[cfg(feature = "install")]...it's tempting to drop that as I doubt anyone cares. Or maybe only have the to-disk path be under #[cfg(feature = "install-to-disk")] since it's that code that's more the "demo"/"toy".

@cgwalters
Copy link
Collaborator Author

Anyways, this one is updated.

@omertuc
Copy link
Contributor

omertuc commented Dec 5, 2024

So we now have:

bootc internals bootc-install-completion

and

bootc install ensure-completion

The latter is for running from anaconda, and takes no arguments

Are we happy with these names? Can you update the commit messages to mention both options / the difference between them?

IIUC bootc-install-completion is always just invoked by bootc itself, it's not meant for users (I guess this is why it's under internals)

Copy link
Contributor

@omertuc omertuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review still incomplete but posting anyway to get things moving faster

ostree-ext/src/container/deploy.rs Outdated Show resolved Hide resolved
@cgwalters
Copy link
Collaborator Author

Are we happy with these names? Can you update the commit messages to mention both options / the difference between them?

The internals one is totally internal and not something a human should ever invoke, that's why I didn't add it even to the commit message.

When bootc was created, it started to become a superset of ostree;
in particular things like `/usr/lib/bootc/kargs.d` and logically
bound images.

However...Anaconda today is still invoking `ostree container image deploy`.

Main fix
--------

When bootc takes over the `/usr/libexec/ostree/ext/ostree-container`
entrypoint, make the existing `ostree container image deploy` CLI actually
just call back into bootc to fix things up. No additional work required other
than getting an updated bootc in the Anaconda ISO.

Old Anaconda ISOs
-----------------

But, a further problem here is that Anaconda is only updated once
per OS major+minor - e.g. there won't be an update to it for the lifetime
of RHEL 9.5 or Fedora 41. We want the ability to ship new
features and bugfixes in those OSes (especially RHEL9.5).

So given that we have a newer bootc in the target container, we can
do this:

```
%post --erroronfail
bootc install ensure-completion
%end
```

And will fix things up. Of course there's fun $details here...the
way Anaconda implements `%post` is via a hand-augmented `chroot`
i.e. a degenerate container, and we need to escape that and
fix some things up (such as a missing cgroupfs mount).

Summmary
--------

- With a newer bootc in the ISO, everything just works
- For older ISOs, one can add the `%post` above as a workaround.

Implementation details: Cross-linking bootc and ostree-rs-ext
-------------------------------------------------------------

This whole thing is very confusing because now, the linkage
between bootc and ostree-rs-ext is bidirectional. In the case
of `bootc install to-filesystem`, we end up calling into ostree-rs-ext,
and we *must not* recurse back into bootc, because at least for
kernel arguments we might end up applying them *twice*. We do
this by passing a CLI argument.

The second problem is the crate-level dependency; right now they're
independent crates so we can't have ostree-rs-ext actually
call into bootc directly, as convenient as that would be. So we
end up forking ourselves as a subprocess. But that's not too bad
because we need to carry a subprocess-based entrypoint *anyways*
for the Anaconda `%post` case.

Implementation details: /etc/resolv.conf
----------------------------------------

There's some surprising stuff going on in how Anaconda handles
`/etc/resolv.conf` in the target root that I got burned by. In
Fedora it's trying to query if systemd-resolved is enabled in
the target or something?

I ended up writing some code to just try to paper over this
to ensure we have networking in the `%post` where we need
it to fetch LBIs.

Signed-off-by: Colin Walters <walters@verbum.org>
@omertuc
Copy link
Contributor

omertuc commented Dec 5, 2024

Testing my understanding:

The reason

%post --erroronfail
bootc install ensure-completion
%end

uses the image's bootc and not the bootc that ships with the OS that anaconda runs on, is because %post's "degenerate container" is actually a chroot into the filesystem that we just installed the image to (using ostree container image deploy), thus ensuring the new bootc binary is available there?

// crates. We need an option to skip though so when the *main*
// bootc install code calls this API, we don't do this as it
// will have already been handled.
if !options.skip_completion {
Copy link
Contributor

@omertuc omertuc Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that all future users of ostree container image deploy (where options.skip_completion is false) will have bootc completion running automatically for them?

What about users doing it on non-bootc images? For missing InstallConfiguration (kargs) I see we use Option::into_iter() which will make it a no-op

Similarly for bound images, it will just be an empty list

So it should work in theory, but did you try this in practice? that the command is happy with completely non-bootc images? (Maybe this is covered by CI already somehow?)

Copy link
Contributor

@omertuc omertuc Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm basically afraid that this PR will introduce a regression / unexpected errors in users of ostree container image deploy with non-bootc images, so it would be nice to ensure it doesn't

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't explicitly test that yet, and we should. However note that none of this code will run in practice until the moment when we take over from the version in rpm-ostree.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But yes basically the heart of this right now is looking for files (kargs, bound image symlinks) and acting on them if present, and if they don't it's a no-op.

@omertuc
Copy link
Contributor

omertuc commented Dec 5, 2024

Other than the two open reviews, lgtm fwiw

@cgwalters
Copy link
Collaborator Author

uses the image's bootc and not the bootc that ships with the OS that anaconda runs on, is because %post's "degenerate container" is actually a chroot into the filesystem that we just installed the image to (using ostree container image deploy), thus ensuring the new bootc binary is available there?

Exactly!

@omertuc
Copy link
Contributor

omertuc commented Dec 5, 2024

OK then just #915 (comment) needs to be addressed

@cgwalters cgwalters merged commit 3b9f4e4 into containers:main Dec 5, 2024
8 of 32 checks passed
@cgwalters cgwalters mentioned this pull request Dec 9, 2024
@cgwalters
Copy link
Collaborator Author

For reference, this does work for LBIs but doesn't for kargs with anaconda; see https://gitlab.com/fedora/bootc/tests/bootc-workflow-test/-/merge_requests/555#note_2284993995

It does work for non Anaconda cases that directly invoke ostree container image deploy - the problem is that Anaconda also then later invokes ostree admin instutil set-kargs (it doesn't need to do this, it could compute the kargs up front and pass them to the deploy), but when it does that it does it without --merge which just blows away the other kargs, which is what the patch linked there fixes.

In theory, we could change ensure-completion to also fix this, but it would be even messier implementation wise.

The LBI support is most important to wedge in to older Anacondas I think, and otherwise we should just proceed with fixing newer Anacondas.

@kfox1111
Copy link

kfox1111 commented Jan 7, 2025

Thats.... unfortunate. Its the exact use case that broke for me. :/

Though I have a workaround...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/install Issues related to `bootc install`
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants