From 41fa7792ab800dbb627b646de7d82f7311e56081 Mon Sep 17 00:00:00 2001 From: Kir Kolyshkin Date: Wed, 6 Sep 2023 13:28:56 -0700 Subject: [PATCH] crun delete: call systemd's reset-failed According to the OCI runtime spec [1], runtime's delete is supposed to remove all the container's artefacts. In case systemd cgroup driver is used, and the systemd unit has failed (e.g. oom-killed), systemd won't remove the unit (that is, unless the "CollectMode: inactive-or-failed" property is set). Leaving a leftover failed unit is a violation of runtime spec; in addition, a leftover unit result in inability to start a container with the same systemd unit name (such operation will fail with "unit already exists" error). Call reset-failed from systemd's cgroup manager destroy_cgroup call, so the failed unit will be removed (by systemd) after "crun delete". This change is similar to the one in runc (see [2]). A (slightly modified) test case from runc added by the above change was used to check that the bug is fixed. For bigger picture, see [3] (issue A) and [4]. To test manually, systemd >= 244 is needed. Create a container config that runs "sleep 10" and has the following systemd annotations: org.systemd.property.RuntimeMaxUSec: "uint64 2000000" org.systemd.property.TimeoutStopUSec: "uint64 1000000" Start a container using --systemd-cgroup option. The container will be killed by systemd in 2 seconds, thus its systemd unit status will be "failed". Once it has failed, the "systemctl status $UNIT_NAME" should have exit code of 3 (meaning "unit is not active"). Now, run "crun delete $CTID" and repeat "systemctl status $UNIT_NAME". It should result in exit code of 4 (meaning "no such unit"). [1] https://github.com/opencontainers/runtime-spec/blob/main/runtime.md#delete [2] https://github.com/opencontainers/runc/pull/3888 [3] https://github.com/opencontainers/runc/issues/3780 [4] https://github.com/cri-o/cri-o/issues/7035 Signed-off-by: Kir Kolyshkin --- src/libcrun/cgroup-systemd.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/libcrun/cgroup-systemd.c b/src/libcrun/cgroup-systemd.c index 1052c4690..8dd71f1f2 100644 --- a/src/libcrun/cgroup-systemd.c +++ b/src/libcrun/cgroup-systemd.c @@ -984,6 +984,9 @@ libcrun_destroy_systemd_cgroup_scope (struct libcrun_cgroup_status *cgroup_statu ret = systemd_check_job_status (bus, &job_data, object, "removing", err); + /* In case of a failed unit, call reset-failed so systemd can remove it. */ + reset_failed_unit (bus, scope); + exit: if (bus) sd_bus_unref (bus);