runc --systemd-cgroup run: need to check that StartTransientUnit succeeds #2313

kolyshkin · 2020-04-14T02:38:28Z

TL;DR: runc should check that system was able to create a unit.

I was working on runc on a Fedora 31 Vagrant box with older selinux packages (container-selinux-2.117.0-1.gitbfde70a.fc31.noarch). Command runc --systemd run xoo succeeded, but the container process was running in the same cgroup as my shell, and the designated cgroup was empty (see #2310 for details).

# runc --systemd run xoo
# runc list
ID          PID         STATUS      BUNDLE         CREATED                          OWNER
xoo         16201       running     /vagrant/tst   2020-04-14T02:23:16.049427501Z   root
# cat /proc/16201/cgroup 
0::/user.slice/user-1000.slice/session-3.scope
# cat /proc/self/cgroup 
0::/user.slice/user-1000.slice/session-3.scope

journalctl shows:

Apr 14 02:23:16 localhost.localdomain audit[1]: AVC avc:  denied  { setsched } for  pid=1 comm="systemd" scontext=system_u:system_r:init_t:s0 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=process permissive=0
Apr 14 02:23:16 localhost.localdomain systemd[1]: runc-xoo.scope: Failed to add PIDs to scope's control group: Permission denied
Apr 14 02:23:16 localhost.localdomain systemd[1]: runc-xoo.scope: Failed with result 'resources'.
Apr 14 02:23:16 localhost.localdomain systemd[1]: Failed to start libcontainer container xoo.

So, systemd failed to create a unit, but runc did not show any errors (it's the same with podman+crun).

When trying to start a unit with systemd.run, it shows an error:

# systemd-run --scope sleep 999
Job failed. See "journalctl -xe" for details.
# journalctl -xe
...
Apr 14 02:36:49 localhost.localdomain audit[1]: AVC avc:  denied  { setsched } for  pid=1 comm="systemd" scontext=system_u:system_r:init_t:s0 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=process permissive=0
Apr 14 02:36:49 localhost.localdomain systemd[1]: foobar.scope: Failed to add PIDs to scope's control group: Permission denied
Apr 14 02:36:49 localhost.localdomain systemd[1]: foobar.scope: Failed with result 'resources'.
Apr 14 02:36:49 localhost.localdomain systemd[1]: Failed to start /usr/bin/sleep 999.
...

The text was updated successfully, but these errors were encountered:

kolyshkin · 2020-04-14T02:39:21Z

The issue is probably true for systemd v1 aka LegacyManager.

kolyshkin · 2020-04-15T14:23:09Z

~~Might be related (need to check): #1780~~ No, not really

While playing with Fedora 31 host with old/broken selinux packages, I found out that systemd fails to create a transient unit. Here is an except from journalctl: > audit[1]: AVC avc: denied { setsched } for pid=1 comm="systemd" scontext=system_u:system_r:init_t:s0 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=process permissive=0 > systemd[1]: crun-555.scope: Failed to add PIDs to scope's control group: Permission denied > systemd[1]: crun-555.scope: Failed with result 'resources'. > systemd[1]: Failed to start libcrun container. and yet crun did not show any error and proceeded to start the container. Moreover, since the cgroup was not created by systemd, but the error was not detected, the container process was put into the wrong cgroup (the same as the `crun` caller). Finally, since crun gets the cgroup name from /proc/$PID/cgroup (where $PID is container process PID), it proceeded to set the limits for that (wrong) cgroup: > # cat /sys/fs/cgroup/system.slice/sshd.service/memory.max > 536870912 Finally, `crun delete` apparently removes the `system.slice/sshd.service` cgroup :( The primary cause is the missing check that the transient unit has been created. This is what this patch adds (similar to how it's done in cgroup-run code). After this patch: > # ../crun --systemd-cgroup run -d 555 > 2020-04-16T14:47:34.000354150Z: Systemd unit crun-555.scope failed: failed. See "journalctl -xe" for details. > 2020-04-16T14:47:34.000355064Z: Systemd unit crun-555.scope failed: failed. See "journalctl -xe" for details. For more background on how the issue was found, steps to repro etc please see a similar (but much less brutal -- it just fails to start the container) issue in runc: - opencontainers/runc#2313 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

While playing with Fedora 31 host with old/broken selinux packages, I found out that systemd fails to create a transient unit. Here is an except from journalctl: > audit[1]: AVC avc: denied { setsched } for pid=1 comm="systemd" scontext=system_u:system_r:init_t:s0 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=process permissive=0 > systemd[1]: crun-555.scope: Failed to add PIDs to scope's control group: Permission denied > systemd[1]: crun-555.scope: Failed with result 'resources'. > systemd[1]: Failed to start libcrun container. and yet crun did not show any error and proceeded to start the container, which lead to a number of issues. 1. Since the cgroup was not created by systemd, but the error was not detected, the container process was not put into its own cgroup (but left in the same cgroup as the shell from which `crun` was called). 2. Since crun gets the cgroup name from /proc/$PID/cgroup (where $PID is container process PID), it proceeded to set the limits for that (wrong) cgroup: > # cat /sys/fs/cgroup/system.slice/sshd.service/memory.max > 536870912 3. `crun delete` apparently removes the `system.slice/sshd.service` cgroup :( The primary cause is the missing check that the transient unit has been created. This is what this patch adds (similar to how it's done in cgroup-run code). After this patch: > # ../crun --systemd-cgroup run -d 555 > 2020-04-16T14:47:34.000354150Z: Systemd unit crun-555.scope failed: failed. See "journalctl -xe" for details. > 2020-04-16T14:47:34.000355064Z: Systemd unit crun-555.scope failed: failed. See "journalctl -xe" for details. For more background on how the issue was found, steps to repro etc please see a similar (but much less brutal -- it just fails to start the container) issue in runc: - opencontainers/runc#2313 Fixes: eaccb4b Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

While playing with Fedora 31 host with old/broken selinux packages, I found out that systemd fails to create a transient unit. Here is an except from journalctl: > audit[1]: AVC avc: denied { setsched } for pid=1 comm="systemd" scontext=system_u:system_r:init_t:s0 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=process permissive=0 > systemd[1]: crun-555.scope: Failed to add PIDs to scope's control group: Permission denied > systemd[1]: crun-555.scope: Failed with result 'resources'. > systemd[1]: Failed to start libcrun container. and yet crun did not show any error and proceeded to start the container, which lead to a number of issues. 1. Since the cgroup was not created by systemd, but the error was not detected, the container process was not put into its own cgroup (but left in the same cgroup as the shell from which `crun` was called). 2. Since crun gets the cgroup name from /proc/$PID/cgroup (where $PID is container process PID), it proceeded to set the limits for that (wrong) cgroup: > # cat /sys/fs/cgroup/system.slice/sshd.service/memory.max > 536870912 3. `crun delete` apparently removes the `system.slice/sshd.service` cgroup :( The primary cause is the missing check that the transient unit has been created. This is what this patch adds (similar to how it's done in cgroup-run code). After this patch: > # ../crun --systemd-cgroup run -d 555 > 2020-04-16T14:47:34.000354150Z: Systemd unit crun-555.scope failed: failed. See "journalctl -xe" for details. > 2020-04-16T14:47:34.000355064Z: Systemd unit crun-555.scope failed: failed. See "journalctl -xe" for details. For more background on how the issue was found, steps to repro etc please see a similar (but much less brutal -- it just fails to start the container) issue in runc: - opencontainers/runc#2313 [v2: also check status in destroy_systemd_cgroup_scope()] Fixes: eaccb4b Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

While playing with Fedora 31 host with old/broken selinux packages, I found out that systemd fails to create a transient unit. Here is an except from journalctl: > audit[1]: AVC avc: denied { setsched } for pid=1 comm="systemd" scontext=system_u:system_r:init_t:s0 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=process permissive=0 > systemd[1]: crun-555.scope: Failed to add PIDs to scope's control group: Permission denied > systemd[1]: crun-555.scope: Failed with result 'resources'. > systemd[1]: Failed to start libcrun container. and yet crun did not show any error and proceeded to start the container, which lead to a number of issues. 1. Since the cgroup was not created by systemd, but the error was not detected, the container process was not put into its own cgroup (but left in the same cgroup as the shell from which `crun` was called). 2. Since crun gets the cgroup name from /proc/$PID/cgroup (where $PID is container process PID), it proceeded to set the limits for that (wrong) cgroup: # cat /sys/fs/cgroup/system.slice/sshd.service/memory.max 536870912 3. `crun delete` apparently removes the `system.slice/sshd.service` cgroup :( The primary cause is the missing check that the transient unit has been created. This is what this patch adds (similar to how it's done in cgroup-run code). After this patch: # ../crun --systemd-cgroup run -d 555 2020-04-16T14:47:34.000354150Z: error creating systemd unit crun-555.scope: failed For more background on how the issue was found, steps to repro etc please see a similar (but much less brutal -- it just fails to start the container) issue in runc: - opencontainers/runc#2313 [v2: also check status in destroy_systemd_cgroup_scope()] Fixes: eaccb4b Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

While playing with Fedora 31 host with old/broken selinux packages, I found out that systemd fails to create a transient unit. Here is an except from journalctl: > audit[1]: AVC avc: denied { setsched } for pid=1 comm="systemd" scontext=system_u:system_r:init_t:s0 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=process permissive=0 > systemd[1]: crun-555.scope: Failed to add PIDs to scope's control group: Permission denied > systemd[1]: crun-555.scope: Failed with result 'resources'. > systemd[1]: Failed to start libcrun container. and yet crun did not show any error and proceeded to start the container, which lead to a number of issues. 1. Since the cgroup was not created by systemd, but the error was not detected, the container process was not put into its own cgroup (but left in the same cgroup as the shell from which `crun` was called). 2. Since crun gets the cgroup name from /proc/$PID/cgroup (where $PID is container process PID), it proceeded to set the limits for that (wrong) cgroup: # cat /sys/fs/cgroup/system.slice/sshd.service/memory.max 536870912 3. `crun delete` apparently removes the `system.slice/sshd.service` cgroup :( The primary cause is the missing check that the transient unit has been created. This is what this patch adds (similar to how it's done in cgroup-run code). After this patch: # ../crun --systemd-cgroup run -d 555 2020-04-16T14:47:34.000354150Z: error creating systemd unit crun-555.scope: failed For more background on how the issue was found, steps to repro etc please see a similar (but much less brutal -- it just fails to start the container) issue in runc: - opencontainers/runc#2313 While at it, abstract out the code preparing and doing the check. [v2: also check status in destroy_systemd_cgroup_scope()] Fixes: eaccb4b Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

While playing with Fedora 31 host with old/broken selinux packages, I found out that systemd fails to create a transient unit. Here is an except from journalctl: > audit[1]: AVC avc: denied { setsched } for pid=1 comm="systemd" scontext=system_u:system_r:init_t:s0 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=process permissive=0 > systemd[1]: crun-555.scope: Failed to add PIDs to scope's control group: Permission denied > systemd[1]: crun-555.scope: Failed with result 'resources'. > systemd[1]: Failed to start libcrun container. and yet crun did not show any error and proceeded to start the container, which lead to a number of issues. 1. Since the cgroup was not created by systemd, but the error was not detected, the container process was not put into its own cgroup (but left in the same cgroup as the shell from which `crun` was called). 2. Since crun gets the cgroup name from /proc/$PID/cgroup (where $PID is container process PID), it proceeded to set the limits for that (wrong) cgroup: # cat /sys/fs/cgroup/system.slice/sshd.service/memory.max 536870912 3. `crun delete` apparently removes the `system.slice/sshd.service` cgroup :( The primary cause is the missing check that the transient unit has been created. This is what this patch adds (similar to how it's done in cgroup-run code). After this patch: # ../crun --systemd-cgroup run -d 555 2020-04-16T14:47:34.000354150Z: error creating systemd unit crun-555.scope: failed For more background on how the issue was found, steps to repro etc please see a similar (but much less brutal -- it just fails to start the container) issue in runc: - opencontainers/runc#2313 While at it, abstract out the code preparing and doing the check. Fixes: eaccb4b Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

@AkihiroSuda

check that StartTransientUnit/StopUnit succeeds LGTMs: @AkihiroSuda @kolyshkin Closes #2313, #2309

kolyshkin changed the title ~~Check that StartTransientUnit succeeds~~ runc --systemd-cgroup run: need to check that StartTransientUnit succeeds Apr 14, 2020

This was referenced Apr 14, 2020

[SELinux] cgroupv2: runc run --systemd-cgroup do not put container in proper cgroup #2310

Closed

Initial integration tests for cgroupv2 #2295

Merged

kolyshkin mentioned this issue Apr 15, 2020

cgroupv2 support meta issue #2315

Closed

kolyshkin mentioned this issue Apr 16, 2020

cgroup: check systemd unit creation/removal succeeded containers/crun#331

Merged

kolyshkin mentioned this issue Apr 17, 2020

cgroup: check for systemd job result containers/crun#328

Closed

lifubang mentioned this issue Apr 20, 2020

check that StartTransientUnit/StopUnit succeeds #2331

Merged

kolyshkin closed this as completed in #2331 Apr 28, 2020

kolyshkin added a commit that referenced this issue Apr 28, 2020

Merge pull request #2331 from lifubang/StartTransientUnit

0a4dcc0

check that StartTransientUnit/StopUnit succeeds LGTMs: @AkihiroSuda @kolyshkin Closes #2313, #2309

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runc --systemd-cgroup run: need to check that StartTransientUnit succeeds #2313

runc --systemd-cgroup run: need to check that StartTransientUnit succeeds #2313

kolyshkin commented Apr 14, 2020

kolyshkin commented Apr 14, 2020

kolyshkin commented Apr 15, 2020 •

edited

Loading

runc --systemd-cgroup run: need to check that StartTransientUnit succeeds #2313

runc --systemd-cgroup run: need to check that StartTransientUnit succeeds #2313

Comments

kolyshkin commented Apr 14, 2020

kolyshkin commented Apr 14, 2020

kolyshkin commented Apr 15, 2020 • edited Loading

kolyshkin commented Apr 15, 2020 •

edited

Loading