Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quadlet: make user units wait for network #24305

Merged
merged 6 commits into from
Oct 18, 2024

Conversation

Luap99
Copy link
Member

@Luap99 Luap99 commented Oct 17, 2024

As documented in the issue there is no way to wait for system units from
the user session[1]. This causes problems for rootless quadlet units as
they might be started before the network is fully up. TWhile this was
always the case and thus was never really noticed the main thing that
trigger a bunch of errors was the switch to pasta.

Pasta requires the network to be fully up in order to correctly select
the right "template" interface based on the routes. If it cannot find a
suitable interface it just fails and we cannot start the container
understandingly leading to a lot of frustration from users.

As there is no sign of any movement on the systemd issue we work around
here by using our own user unit that check if the system session
network-online.target it ready.

[1] systemd/systemd#3312

Fixes #22197

also see the other commits

Does this PR introduce a user-facing change?

quadlet: user units now correctly wait for the network to be ready by not using the non functional network-online.target in user session and instead use the custom podman-user-wait-network-online.service to wait for the network.
The podman-auto-update systemd unit files have been moved into the contrib/systemd/system directory for consistency with our other unit files. 

@openshift-ci openshift-ci bot added release-note approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 17, 2024
@Luap99 Luap99 added the quadlet label Oct 17, 2024
@Luap99
Copy link
Member Author

Luap99 commented Oct 17, 2024

@ygalblum @vrothberg PTAL

@Luap99
Copy link
Member Author

Luap99 commented Oct 17, 2024

I need #24306 first for some test fixes but then I also have to figure out how to do if root/rootless conditions in the test file assert conditions?

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Thanks a ton for tackling that, @Luap99 !

Makefile Outdated
@@ -982,7 +982,7 @@ install.docker-full: install.docker install.docker-docs

.PHONY: install.systemd
ifneq (,$(findstring systemd,$(BUILDTAGS)))
PODMAN_UNIT_FILES = contrib/systemd/auto-update/podman-auto-update.service \
PODMAN_UNIT_FILES = contrib/systemd/system/podman-auto-update.service \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change has potential implications on packaging; depending on the usage of the make target. Can you add another entry to the changelog highlighting the new home of the files just to be extra safe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes sure, I think all users must use the Makefile target anyway as otherwise we only have the service.in template file in the git repo and you need the makefile to replace the string with the actual podman binary and generate the real unit .

Makefile Outdated
install ${SELINUXOPT} -m 644 contrib/systemd/system/podman-restart.service $(DESTDIR)${SYSTEMDDIR}/podman-restart.service
install ${SELINUXOPT} -m 644 contrib/systemd/system/podman-kube@.service $(DESTDIR)${SYSTEMDDIR}/podman-kube@.service
install ${SELINUXOPT} -m 644 contrib/systemd/system/podman-clean-transient.service $(DESTDIR)${SYSTEMDDIR}/podman-clean-transient.service
for unit in podman-auto-update.service podman-auto-update.timer \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍

There is really no reason why these should be in separate dir.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
@Luap99
Copy link
Member Author

Luap99 commented Oct 17, 2024

Tests added, I leave it in draft as I still want to manually verify via final build rpm that this actually works when installed on real system and does not start early.

Copy link
Contributor

@ygalblum ygalblum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Small comments.

Makefile Outdated Show resolved Hide resolved
docs/source/markdown/podman-systemd.unit.5.md Outdated Show resolved Hide resolved
@Luap99
Copy link
Member Author

Luap99 commented Oct 18, 2024

Ok tested in this in a VM (with the rpm build from this PR) and it works.

-- Boot 43f6c6dec24740e790a831eb04b893bc --
Oct 18 11:10:11 fedora systemd[947]: Starting podman-user-wait-network-online.service - Wait for system level network-online.target as user....
Oct 18 11:10:11 fedora sh[1007]: inactive
Oct 18 11:10:11 fedora sh[1009]: inactive
Oct 18 11:10:12 fedora sh[1011]: inactive
Oct 18 11:10:12 fedora sh[1013]: inactive
Oct 18 11:10:13 fedora sh[1015]: inactive
Oct 18 11:10:13 fedora sh[1017]: inactive
Oct 18 11:10:14 fedora sh[1019]: inactive
Oct 18 11:10:14 fedora sh[1052]: active
Oct 18 11:10:14 fedora systemd[947]: Finished podman-user-wait-network-online.service - Wait for system level network-online.target as user..
Oct 18 11:10:14 fedora systemd[947]: Starting test.service...

test.service being my quadlet unit that failed before to start but now waits correctly as shown in the log.

Use a single loop for both the user and system service so we do not have
to duplicate the full paths every time.
In particular we can use `$^` to list all dependecies and then add the
not generated files to the loop as well to simplify this. And to make
things clear rename PODMAN_UNIT_FILES to PODMAN_GENERATED_UNIT_FILES so
readers immediately know they are generated and are safe to delete in
contrast to the .socket/.timer unit that are not and part of the git
history.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The reason being that I plan to add a unit that should only be used for
the user session and otherwise there is no way to only keep a unit in
user.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This service is meant to be used by quadlet as replacement for
network-online.target as this does not work for rootless users.

see containers#22197

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
As documented in the issue there is no way to wait for system units from
the user session[1]. This causes problems for rootless quadlet units as
they might be started before the network is fully up. TWhile this was
always the case and thus was never really noticed the main thing that
trigger a bunch of errors was the switch to pasta.

Pasta requires the network to be fully up in order to correctly select
the right "template" interface based on the routes. If it cannot find a
suitable interface it just fails and we cannot start the container
understandingly leading to a lot of frustration from users.

As there is no sign of any movement on the systemd issue we work around
here by using our own user unit that check if the system session
network-online.target it ready.

Now for testing it is a bit complicated. While we do now correctly test
the root and rootless generator since commit ada75c0 the resulting
Wants/After= lines differ between them and there is no logic in the
testfiles themself to say if root/rootless to match specifics. One idea
was to use `assert-key-is-rootless/root` but that seemed like more
duplication for little reason so use a regex and allow both to make it
pass always. To still have some test coverage add a check in the system
test to ask systemd if we did indeed have the right depdendencies where
we can check for exact root/rootless name match.

[1] systemd/systemd#3312

Fixes containers#22197

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
There is no good reason for the special case, kube and pod units
definitely need it. Volume and network units maybe not but for
consistency we add it there as well. This makes the docs much easier to
write and understand for users as the behavior will not differ.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
@Luap99 Luap99 marked this pull request as ready for review October 18, 2024 12:01
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 18, 2024
Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@ygalblum ygalblum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 18, 2024
Copy link
Contributor

openshift-ci bot commented Oct 18, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Luap99, vrothberg, ygalblum

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [Luap99,vrothberg,ygalblum]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ygalblum
Copy link
Contributor

/hold

Not sure if this is needed for waiting for the tests to finish

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 18, 2024
@Luap99
Copy link
Member Author

Luap99 commented Oct 18, 2024

/hold

Not sure if this is needed for waiting for the tests to finish

It is no longer needed on podman and many (but not all) other our repos, https://github.com/openshift/release/blob/8ddcd35900db92e050b1fbee76e9e1242c2061f6/core-services/prow/02_config/_config.yaml#L510
If there is no required-contexts set there then it can merge early otherwise it blocks for the given task name there.

Bet yes better safe then sorry so adding /hold is fine

@Luap99
Copy link
Member Author

Luap99 commented Oct 18, 2024

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 18, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit d3df5c5 into containers:main Oct 18, 2024
86 checks passed
@Luap99 Luap99 deleted the quadlet-pasta branch October 18, 2024 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. quadlet release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Podman fail to autostart containers through quadlet/systemd, works when launched manually, error with pasta
3 participants