Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runc --systemd-cgroup run: need to check that StartTransientUnit succeeds #2313

Closed
kolyshkin opened this issue Apr 14, 2020 · 2 comments · Fixed by #2331
Closed

runc --systemd-cgroup run: need to check that StartTransientUnit succeeds #2313

kolyshkin opened this issue Apr 14, 2020 · 2 comments · Fixed by #2331

Comments

@kolyshkin
Copy link
Contributor

TL;DR: runc should check that system was able to create a unit.

I was working on runc on a Fedora 31 Vagrant box with older selinux packages (container-selinux-2.117.0-1.gitbfde70a.fc31.noarch). Command runc --systemd run xoo succeeded, but the container process was running in the same cgroup as my shell, and the designated cgroup was empty (see #2310 for details).

# runc --systemd run xoo
# runc list
ID          PID         STATUS      BUNDLE         CREATED                          OWNER
xoo         16201       running     /vagrant/tst   2020-04-14T02:23:16.049427501Z   root
# cat /proc/16201/cgroup 
0::/user.slice/user-1000.slice/session-3.scope
# cat /proc/self/cgroup 
0::/user.slice/user-1000.slice/session-3.scope

journalctl shows:

Apr 14 02:23:16 localhost.localdomain audit[1]: AVC avc:  denied  { setsched } for  pid=1 comm="systemd" scontext=system_u:system_r:init_t:s0 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=process permissive=0
Apr 14 02:23:16 localhost.localdomain systemd[1]: runc-xoo.scope: Failed to add PIDs to scope's control group: Permission denied
Apr 14 02:23:16 localhost.localdomain systemd[1]: runc-xoo.scope: Failed with result 'resources'.
Apr 14 02:23:16 localhost.localdomain systemd[1]: Failed to start libcontainer container xoo.

So, systemd failed to create a unit, but runc did not show any errors (it's the same with podman+crun).

When trying to start a unit with systemd.run, it shows an error:

# systemd-run --scope sleep 999
Job failed. See "journalctl -xe" for details.
# journalctl -xe
...
Apr 14 02:36:49 localhost.localdomain audit[1]: AVC avc:  denied  { setsched } for  pid=1 comm="systemd" scontext=system_u:system_r:init_t:s0 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=process permissive=0
Apr 14 02:36:49 localhost.localdomain systemd[1]: foobar.scope: Failed to add PIDs to scope's control group: Permission denied
Apr 14 02:36:49 localhost.localdomain systemd[1]: foobar.scope: Failed with result 'resources'.
Apr 14 02:36:49 localhost.localdomain systemd[1]: Failed to start /usr/bin/sleep 999.
...
@kolyshkin kolyshkin changed the title Check that StartTransientUnit succeeds runc --systemd-cgroup run: need to check that StartTransientUnit succeeds Apr 14, 2020
@kolyshkin
Copy link
Contributor Author

The issue is probably true for systemd v1 aka LegacyManager.

@kolyshkin
Copy link
Contributor Author

kolyshkin commented Apr 15, 2020

Might be related (need to check): #1780 No, not really

kolyshkin added a commit to kolyshkin/crun that referenced this issue Apr 16, 2020
While playing with Fedora 31 host with old/broken selinux packages,
I found out that systemd fails to create a transient unit. Here is
an except from journalctl:

> audit[1]: AVC avc:  denied  { setsched } for  pid=1 comm="systemd" scontext=system_u:system_r:init_t:s0 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=process permissive=0
> systemd[1]: crun-555.scope: Failed to add PIDs to scope's control group: Permission denied
> systemd[1]: crun-555.scope: Failed with result 'resources'.
> systemd[1]: Failed to start libcrun container.

and yet crun did not show any error and proceeded to start the
container.

Moreover, since the cgroup was not created by systemd,
but the error was not detected, the container process was put
into the wrong cgroup (the same as the `crun` caller).

Finally, since crun gets the cgroup name from /proc/$PID/cgroup
(where $PID is container process PID), it proceeded to set the
limits for that (wrong) cgroup:

> # cat /sys/fs/cgroup/system.slice/sshd.service/memory.max
> 536870912

Finally, `crun delete` apparently removes the `system.slice/sshd.service`
cgroup :(

The primary cause is the missing check that the transient unit has
been created. This is what this patch adds (similar to how it's done
in cgroup-run code).

After this patch:

> # ../crun --systemd-cgroup run -d 555
> 2020-04-16T14:47:34.000354150Z: Systemd unit crun-555.scope failed: failed. See "journalctl -xe" for details.
> 2020-04-16T14:47:34.000355064Z: Systemd unit crun-555.scope failed: failed. See "journalctl -xe" for details.

For more background on how the issue was found, steps to repro etc
please see a similar (but much less brutal -- it just fails to start
the container) issue in runc:

 - opencontainers/runc#2313

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/crun that referenced this issue Apr 16, 2020
While playing with Fedora 31 host with old/broken selinux packages,
I found out that systemd fails to create a transient unit. Here is
an except from journalctl:

> audit[1]: AVC avc:  denied  { setsched } for  pid=1 comm="systemd" scontext=system_u:system_r:init_t:s0 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=process permissive=0
> systemd[1]: crun-555.scope: Failed to add PIDs to scope's control group: Permission denied
> systemd[1]: crun-555.scope: Failed with result 'resources'.
> systemd[1]: Failed to start libcrun container.

and yet crun did not show any error and proceeded to start the
container, which lead to a number of issues.

1. Since the cgroup was not created by systemd, but the error
was not detected, the container process was not put into its own
cgroup (but left in the same cgroup as the shell from which `crun`
was called).

2. Since crun gets the cgroup name from /proc/$PID/cgroup
(where $PID is container process PID), it proceeded to set the
limits for that (wrong) cgroup:

> # cat /sys/fs/cgroup/system.slice/sshd.service/memory.max
> 536870912

3. `crun delete` apparently removes the `system.slice/sshd.service`
cgroup :(

The primary cause is the missing check that the transient unit has
been created. This is what this patch adds (similar to how it's done
in cgroup-run code).

After this patch:

> # ../crun --systemd-cgroup run -d 555
> 2020-04-16T14:47:34.000354150Z: Systemd unit crun-555.scope failed: failed. See "journalctl -xe" for details.
> 2020-04-16T14:47:34.000355064Z: Systemd unit crun-555.scope failed: failed. See "journalctl -xe" for details.

For more background on how the issue was found, steps to repro etc
please see a similar (but much less brutal -- it just fails to start
the container) issue in runc:

 - opencontainers/runc#2313

Fixes: eaccb4b
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/crun that referenced this issue Apr 16, 2020
While playing with Fedora 31 host with old/broken selinux packages,
I found out that systemd fails to create a transient unit. Here is
an except from journalctl:

> audit[1]: AVC avc:  denied  { setsched } for  pid=1 comm="systemd" scontext=system_u:system_r:init_t:s0 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=process permissive=0
> systemd[1]: crun-555.scope: Failed to add PIDs to scope's control group: Permission denied
> systemd[1]: crun-555.scope: Failed with result 'resources'.
> systemd[1]: Failed to start libcrun container.

and yet crun did not show any error and proceeded to start the
container, which lead to a number of issues.

1. Since the cgroup was not created by systemd, but the error
was not detected, the container process was not put into its own
cgroup (but left in the same cgroup as the shell from which `crun`
was called).

2. Since crun gets the cgroup name from /proc/$PID/cgroup
(where $PID is container process PID), it proceeded to set the
limits for that (wrong) cgroup:

> # cat /sys/fs/cgroup/system.slice/sshd.service/memory.max
> 536870912

3. `crun delete` apparently removes the `system.slice/sshd.service`
cgroup :(

The primary cause is the missing check that the transient unit has
been created. This is what this patch adds (similar to how it's done
in cgroup-run code).

After this patch:

> # ../crun --systemd-cgroup run -d 555
> 2020-04-16T14:47:34.000354150Z: Systemd unit crun-555.scope failed: failed. See "journalctl -xe" for details.
> 2020-04-16T14:47:34.000355064Z: Systemd unit crun-555.scope failed: failed. See "journalctl -xe" for details.

For more background on how the issue was found, steps to repro etc
please see a similar (but much less brutal -- it just fails to start
the container) issue in runc:

 - opencontainers/runc#2313

[v2: also check status in destroy_systemd_cgroup_scope()]

Fixes: eaccb4b
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/crun that referenced this issue Apr 16, 2020
While playing with Fedora 31 host with old/broken selinux packages,
I found out that systemd fails to create a transient unit. Here is
an except from journalctl:

> audit[1]: AVC avc:  denied  { setsched } for  pid=1 comm="systemd" scontext=system_u:system_r:init_t:s0 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=process permissive=0
> systemd[1]: crun-555.scope: Failed to add PIDs to scope's control group: Permission denied
> systemd[1]: crun-555.scope: Failed with result 'resources'.
> systemd[1]: Failed to start libcrun container.

and yet crun did not show any error and proceeded to start the
container, which lead to a number of issues.

1. Since the cgroup was not created by systemd, but the error
was not detected, the container process was not put into its own
cgroup (but left in the same cgroup as the shell from which `crun`
was called).

2. Since crun gets the cgroup name from /proc/$PID/cgroup
(where $PID is container process PID), it proceeded to set the
limits for that (wrong) cgroup:

 # cat /sys/fs/cgroup/system.slice/sshd.service/memory.max
 536870912

3. `crun delete` apparently removes the `system.slice/sshd.service`
cgroup :(

The primary cause is the missing check that the transient unit has
been created. This is what this patch adds (similar to how it's done
in cgroup-run code).

After this patch:

 # ../crun --systemd-cgroup run -d 555
 2020-04-16T14:47:34.000354150Z: error creating systemd unit crun-555.scope: failed

For more background on how the issue was found, steps to repro etc
please see a similar (but much less brutal -- it just fails to start
the container) issue in runc:

 - opencontainers/runc#2313

[v2: also check status in destroy_systemd_cgroup_scope()]

Fixes: eaccb4b
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/crun that referenced this issue Apr 16, 2020
While playing with Fedora 31 host with old/broken selinux packages,
I found out that systemd fails to create a transient unit. Here is
an except from journalctl:

> audit[1]: AVC avc:  denied  { setsched } for  pid=1 comm="systemd" scontext=system_u:system_r:init_t:s0 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=process permissive=0
> systemd[1]: crun-555.scope: Failed to add PIDs to scope's control group: Permission denied
> systemd[1]: crun-555.scope: Failed with result 'resources'.
> systemd[1]: Failed to start libcrun container.

and yet crun did not show any error and proceeded to start the
container, which lead to a number of issues.

1. Since the cgroup was not created by systemd, but the error
was not detected, the container process was not put into its own
cgroup (but left in the same cgroup as the shell from which `crun`
was called).

2. Since crun gets the cgroup name from /proc/$PID/cgroup
(where $PID is container process PID), it proceeded to set the
limits for that (wrong) cgroup:

 # cat /sys/fs/cgroup/system.slice/sshd.service/memory.max
 536870912

3. `crun delete` apparently removes the `system.slice/sshd.service`
cgroup :(

The primary cause is the missing check that the transient unit has
been created. This is what this patch adds (similar to how it's done
in cgroup-run code).

After this patch:

 # ../crun --systemd-cgroup run -d 555
 2020-04-16T14:47:34.000354150Z: error creating systemd unit crun-555.scope: failed

For more background on how the issue was found, steps to repro etc
please see a similar (but much less brutal -- it just fails to start
the container) issue in runc:

 - opencontainers/runc#2313

While at it, abstract out the code preparing and doing the check.

[v2: also check status in destroy_systemd_cgroup_scope()]

Fixes: eaccb4b
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/crun that referenced this issue Apr 16, 2020
While playing with Fedora 31 host with old/broken selinux packages,
I found out that systemd fails to create a transient unit. Here is
an except from journalctl:

> audit[1]: AVC avc:  denied  { setsched } for  pid=1 comm="systemd" scontext=system_u:system_r:init_t:s0 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=process permissive=0
> systemd[1]: crun-555.scope: Failed to add PIDs to scope's control group: Permission denied
> systemd[1]: crun-555.scope: Failed with result 'resources'.
> systemd[1]: Failed to start libcrun container.

and yet crun did not show any error and proceeded to start the
container, which lead to a number of issues.

1. Since the cgroup was not created by systemd, but the error
was not detected, the container process was not put into its own
cgroup (but left in the same cgroup as the shell from which `crun`
was called).

2. Since crun gets the cgroup name from /proc/$PID/cgroup
(where $PID is container process PID), it proceeded to set the
limits for that (wrong) cgroup:

 # cat /sys/fs/cgroup/system.slice/sshd.service/memory.max
 536870912

3. `crun delete` apparently removes the `system.slice/sshd.service`
cgroup :(

The primary cause is the missing check that the transient unit has
been created. This is what this patch adds (similar to how it's done
in cgroup-run code).

After this patch:

 # ../crun --systemd-cgroup run -d 555
 2020-04-16T14:47:34.000354150Z: error creating systemd unit crun-555.scope: failed

For more background on how the issue was found, steps to repro etc
please see a similar (but much less brutal -- it just fails to start
the container) issue in runc:

 - opencontainers/runc#2313

While at it, abstract out the code preparing and doing the check.

Fixes: eaccb4b
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/crun that referenced this issue Apr 17, 2020
While playing with Fedora 31 host with old/broken selinux packages,
I found out that systemd fails to create a transient unit. Here is
an except from journalctl:

> audit[1]: AVC avc:  denied  { setsched } for  pid=1 comm="systemd" scontext=system_u:system_r:init_t:s0 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=process permissive=0
> systemd[1]: crun-555.scope: Failed to add PIDs to scope's control group: Permission denied
> systemd[1]: crun-555.scope: Failed with result 'resources'.
> systemd[1]: Failed to start libcrun container.

and yet crun did not show any error and proceeded to start the
container, which lead to a number of issues.

1. Since the cgroup was not created by systemd, but the error
was not detected, the container process was not put into its own
cgroup (but left in the same cgroup as the shell from which `crun`
was called).

2. Since crun gets the cgroup name from /proc/$PID/cgroup
(where $PID is container process PID), it proceeded to set the
limits for that (wrong) cgroup:

 # cat /sys/fs/cgroup/system.slice/sshd.service/memory.max
 536870912

3. `crun delete` apparently removes the `system.slice/sshd.service`
cgroup :(

The primary cause is the missing check that the transient unit has
been created. This is what this patch adds (similar to how it's done
in cgroup-run code).

After this patch:

 # ../crun --systemd-cgroup run -d 555
 2020-04-16T14:47:34.000354150Z: error creating systemd unit crun-555.scope: failed

For more background on how the issue was found, steps to repro etc
please see a similar (but much less brutal -- it just fails to start
the container) issue in runc:

 - opencontainers/runc#2313

While at it, abstract out the code preparing and doing the check.

Fixes: eaccb4b
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/crun that referenced this issue Apr 17, 2020
While playing with Fedora 31 host with old/broken selinux packages,
I found out that systemd fails to create a transient unit. Here is
an except from journalctl:

> audit[1]: AVC avc:  denied  { setsched } for  pid=1 comm="systemd" scontext=system_u:system_r:init_t:s0 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=process permissive=0
> systemd[1]: crun-555.scope: Failed to add PIDs to scope's control group: Permission denied
> systemd[1]: crun-555.scope: Failed with result 'resources'.
> systemd[1]: Failed to start libcrun container.

and yet crun did not show any error and proceeded to start the
container, which lead to a number of issues.

1. Since the cgroup was not created by systemd, but the error
was not detected, the container process was not put into its own
cgroup (but left in the same cgroup as the shell from which `crun`
was called).

2. Since crun gets the cgroup name from /proc/$PID/cgroup
(where $PID is container process PID), it proceeded to set the
limits for that (wrong) cgroup:

 # cat /sys/fs/cgroup/system.slice/sshd.service/memory.max
 536870912

3. `crun delete` apparently removes the `system.slice/sshd.service`
cgroup :(

The primary cause is the missing check that the transient unit has
been created. This is what this patch adds (similar to how it's done
in cgroup-run code).

After this patch:

 # ../crun --systemd-cgroup run -d 555
 2020-04-16T14:47:34.000354150Z: error creating systemd unit crun-555.scope: failed

For more background on how the issue was found, steps to repro etc
please see a similar (but much less brutal -- it just fails to start
the container) issue in runc:

 - opencontainers/runc#2313

While at it, abstract out the code preparing and doing the check.

Fixes: eaccb4b
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit that referenced this issue Apr 28, 2020
check that StartTransientUnit/StopUnit succeeds

LGTMs: @AkihiroSuda @kolyshkin 
Closes #2313, #2309
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant