Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runc-dmz does not play well with selinux #4057

Closed
kolyshkin opened this issue Oct 5, 2023 · 19 comments
Closed

runc-dmz does not play well with selinux #4057

kolyshkin opened this issue Oct 5, 2023 · 19 comments
Milestone

Comments

@kolyshkin
Copy link
Contributor

Description

When .process.selinuxLabel is set, runc uses it to set the executable context (calls selinux.SetExecLabel) before we execute the container binary.

With the dmz feature (introduced in #3987), we now execute runc-dmz (which, in turn, executes the container binary), and we do it with the exec context of the container (selinux.SetExecLabel call). Alas, the container context does not give us enough permissions to use runc-dmz (see e.g. containers/container-selinux#274).

Steps to reproduce the issue

The issue is reproduced on CentOS and Fedora (which has SELinux enabled and in enforcing mode by default) using test cases added in #4053.

Describe the results you received and expected

The issue is reproduced with tests added in #4053. Here's the result:

For CentOS 7:

ok 162 runc run (no selinux label)
not ok 163 runc run (custom selinux label)
# (in test file tests/integration/selinux.bats, line 35)
#   `[ "$status" -eq 0 ]' failed
# runc spec (status=0):
#
# runc run tst (status=139):
#
# type=AVC msg=audit(1696402028.469:690): avc:  denied  { read execute } for  pid=501 comm="6" path=2F6D656D66643A72756E635F636C6F6E65643A72756E632D646D7A202864656C6574656429 dev="tmpfs" ino=146799 scontext=system_u:system_r:container_t:s0:c4,c5 tcontext=unconfined_u:object_r:container_runtime_tmpfs_t:s0 tclass=file permissive=0
ok 164 runc run (custom selinux label, RUNC_DMZ=legacy)

For CentOS 8/9 and Fedora:

ok 162 runc run (no selinux label)
not ok 163 runc run (custom selinux label)
# (in test file tests/integration/selinux.bats, line 35)
#   `[ "$status" -eq 0 ]' failed
# runc spec (status=0):
#
# runc run tst (status=1):
# writing sync procError: write sync: file already closed
# execveat: permission denied
# type=AVC msg=audit(1696403137.078:10848): avc:  denied  { entrypoint } for  pid=105267 comm="runc:[2:INIT]" path=2F6D656D66643A72756E635F636C6F6E65643A72756E632D646D7A202864656C6574656429 dev="tmpfs" ino=276 scontext=system_u:system_r:container_t:s0:c4,c5 tcontext=unconfined_u:object_r:container_runtime_tmpfs_t:s0 tclass=file permissive=0
ok 164 runc run (custom selinux label, RUNC_DMZ=legacy)

What version of runc are you using?

v1.1.0-785-gd8d576ca

Host OS information

Any recent Fedora or CentOS with SELinux enabled and in enforced mode.

Host kernel information

No response

@cyphar
Copy link
Member

cyphar commented Oct 5, 2023

I discussed this issue with Akihiro in the runc-dmz PR -- there isn't a way to detect whether the exec will fail early enough that we can do something about it (faccessat doesn't trigger security hooks, and we need to have already switched to the container label to test it).

My original idea was that distributions that ship with restrictive SELinux policies can build runc with runc_nodmz. RUNC_DMZ=legacy would allow people to work around the issue when using our official binaries (maybe we should suggest using it in the error if execve of runc-dmz fails?). To be fair, this is quite annoying to use in practice.

Now that I think about it -- is the label of a memfd determined with security.selinux? If so, can we just set the xattr to the container label? Something like:

diff --git a/libcontainer/container_linux.go b/libcontainer/container_linux.go
index 35f4f5df390d..235b1509d6cb 100644
--- a/libcontainer/container_linux.go
+++ b/libcontainer/container_linux.go
@@ -17,6 +17,7 @@ import (
        "time"

        "github.com/opencontainers/runtime-spec/specs-go"
+       "github.com/opencontainers/selinux/go-selinux"
        "github.com/sirupsen/logrus"
        "github.com/vishvananda/netlink/nl"
        "golang.org/x/sys/execabs"
@@ -514,6 +515,11 @@ func (c *Container) newParentProcess(p *Process) (parentProcess, error) {
                if isDmzBinarySafe(c.config) {
                        dmzExe, err = dmz.Binary(c.stateDir)
                        if err == nil {
+                               // Set the SELinux label on the memfd to make sure we can exec
+                               // from the container's label.
+                               if err := selinux.SetFileLabel(fmt.Sprintf("/proc/self/fd/%d", dmzExe.Fd()), c.config.ProcessLabel); err != nil {
+                                       return nil, fmt.Errorf("failed to set SELinux label on runc-dmz binary clone: %w", err)
+                               }
                                // We can use our own executable without cloning if we are using
                                // runc-dmz.
                                exePath = "/proc/self/exe"

@kolyshkin
Copy link
Contributor Author

Now that I think about it -- is the label of a memfd determined with security.selinux? If so, can we just set the xattr to the container label?

Alas it's not working

# runc run tst (status=1):
# time="2023-10-06T20:53:28Z" level=error msg="runc run failed: unable to create new parent process: failed to set SELinux label on runc-dmz binary clone: setxattr /proc/self/fd/16: permission denied"
# ----
# type=PROCTITLE msg=audit(10/06/2023 20:53:28.097:678) : proctitle=/tmp/bats-run-ZTGWah/runc.8RJMRr/bundle/runc --systemd-cgroup --root /tmp/bats-run-ZTGWah/runc.8RJMRr/state run tst
# type=SYSCALL msg=audit(10/06/2023 20:53:28.097:678) : arch=x86_64 syscall=setxattr success=no exit=EACCES(Permission denied) a0=0xc000226030 a1=0xc000226048 a2=0xc000214030 a3=0x26 items=0 ppid=636 pid=637 auid=root uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=pts0 ses=7 comm=runc exe=/tmp/bats-run-ZTGWah/runc.8RJMRr/bundle/runc subj=unconfined_u:system_r:container_runtime_t:s0-s0:c0.c1023 key=(null)
# type=AVC msg=audit(10/06/2023 20:53:28.097:678) : avc:  denied  { relabelto } for  pid=637 comm=runc name=memfd:runc_cloned:runc-dmz dev="tmpfs" ino=142329 scontext=unconfined_u:system_r:container_runtime_t:s0-s0:c0.c1023 tcontext=system_u:system_r:container_t:s0:c4,c5 tclass=file permissive=0

@kolyshkin
Copy link
Contributor Author

So far the best workaround I came up with is this:

diff --git a/libcontainer/container_linux.go b/libcontainer/container_linux.go
index 35f4f5df..805ac0a0 100644
--- a/libcontainer/container_linux.go
+++ b/libcontainer/container_linux.go
@@ -17,6 +17,7 @@ import (
 	"time"
 
 	"github.com/opencontainers/runtime-spec/specs-go"
+	"github.com/opencontainers/selinux/go-selinux"
 	"github.com/sirupsen/logrus"
 	"github.com/vishvananda/netlink/nl"
 	"golang.org/x/sys/execabs"
@@ -457,6 +458,13 @@ func slicesContains[S ~[]E, E comparable](slice S, needle E) bool {
 }
 
 func isDmzBinarySafe(c *configs.Config) bool {
+	// SELinux policy can prevent runc to execute the dmz binary. Do not
+	// use dmz in case SELinux is in enforcing mode and the selinux label
+	// is set.
+	if c.ProcessLabel != "" && selinux.EnforceMode() == selinux.Enforcing {
+		return false
+	}
+
 	// Because we set the dumpable flag in nsexec, the only time when it is
 	// unsafe to use runc-dmz is when the container process would be able to
 	// race against "runc init" and bypass the ptrace_may_access() checks.

Not that I like it (mostly because it looks like an overkill), but it fixes the issue.

@AkihiroSuda AkihiroSuda added this to the 1.2.0 milestone Oct 8, 2023
@AkihiroSuda
Copy link
Member

My original idea was that distributions that ship with restrictive SELinux policies can build runc with runc_nodmz.

We can't assume users to use distros' packages.
And RHEL (and similar ones) has already switched to crun.

@AkihiroSuda
Copy link
Member

@kolyshkin Could you submit that patch as a PR?

@rhatdan
Copy link
Contributor

rhatdan commented Oct 11, 2023

@kolyshkin
Copy link
Contributor Author

Currently I see the following version of container-selinux:
Fedora 38: 2.222.0
CentOS Stream 9: 2.222.0-1.el9
CentOS Stream 8: 2.221.0-1.module_el8+653+feef7bfe
CentOS 7: 2.119.2-1.911c772.el7_8 (last updated 2020-06-02)

I'm assuming the change will be backported to CentOS 9 and 8 and maybe Fedora (or will appear in the next Fedora).

Not sure what to do with CentOS 7 though. Disable runc-dmz using a build flag?

@rhatdan @cyphar what do you think?

@kolyshkin
Copy link
Contributor Author

@kolyshkin Could you submit that patch as a PR?

We are still figuring out how to fix this the best way.

@rhatdan
Copy link
Contributor

rhatdan commented Oct 12, 2023

Why are you still supporting RHEL7/Centos7. It no longer gets updates. Only security fixes are backported.

@AkihiroSuda
Copy link
Member

AkihiroSuda commented Oct 12, 2023

Why are you still supporting RHEL7/Centos7. It no longer gets updates. Only security fixes are backported.

Users may still install the latest runc from a third party package (until EOL)

@kolyshkin
Copy link
Contributor Author

Why are you still supporting RHEL7/Centos7. It no longer gets updates. Only security fixes are backported.

CentOS Linux 7 will reach end of life (EOL) on June 30, 2024. We hope we will release runc 1.2 before that date.

We can certainly say "if you're using runc on CentOS 7 with SELinux enabled, compile it with runc_nodmz flag", and be done with it. Perhaps this is indeed the way to go (and we can modify our CI to do that, too).

Otherwise it's too much hassle.

@AkihiroSuda
Copy link
Member

We can certainly say "if you're using runc on CentOS 7 with SELinux enabled, compile it with runc_nodmz flag", and be done with it.

Downstreams often want to provide a single “golden” binary that works on every distro.

@kolyshkin
Copy link
Contributor Author

Downstreams often want to provide a single “golden” binary that works on every distro.

What do you propose? Something like this?

Maybe this can be done, but together with a way to avoid this behavior ("selinux works with runc-dmz") which a distro vendor can set.

@rhatdan
Copy link
Contributor

rhatdan commented Oct 13, 2023

SGTM

@AkihiroSuda
Copy link
Member

Something like #4057 (comment)?

Yes. The distro version (kernel version?) can be checked too.

@kolyshkin
Copy link
Contributor Author

The distro version (kernel version?) can be checked too.

This code path is also used during runc exec, I really don't want any complicated checks there.

@kolyshkin
Copy link
Contributor Author

(status update: still working on this, got distracted by CI failures and other stuff)

@kolyshkin
Copy link
Contributor Author

And RHEL (and similar ones) has already switched to crun.

Not entirely; runc is still widely used.

@kolyshkin Could you submit that patch as a PR?

@AkihiroSuda see #4053.

@AkihiroSuda
Copy link
Member

Fixed in #4053

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants