Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix SELinux process labeling and label sockets correctly #648

Merged
merged 3 commits into from
Mar 23, 2019

Conversation

adrianreber
Copy link
Member

This changes is motivated by containers/podman#2334

The main problem when trying to checkpoint a container on a SELinux enabled system is, that the connection from the parasite daemon to the main CRIU process is blocked by SELinux. The error looks something like:

type=AVC msg=audit(1550142323.799:1167): avc:  denied  { connectto } for  pid=23569 comm="top" path=002F6372746F6F6C732D70722D3233363139 scontext=system_u:system_r:container_t:s0:c245,c463 tcontext=unconfined_u:system_r:container_runtime_t:s0-s0:c0.c1023 tclass=unix_stream_socket permissive=0

That SELinux blocks a connect() from the container process to the outside of the container is understandable.

The solution discussed and which has been implemented in this PR is, that CRIU now labels the socket on a SELinux enabled system with the same SELinux context as the parasite daemon running in the container. With the socket correctly labeled Podman can now checkpoint a container on a SELinux enabled system without denials, because the parasite daemon can access the main CRIU process.

This PR also removes CRIU limitation to only work with the SELinux context 'unconfined_t'. According to @tych0 this limitation was introduced as, at that time, no one was sure if the SELinux implementation actually works. With the correct SELinux policies it works correctly. The biggest problem is, that usually it is not expected that a process changes its own process context. CRIU, however, has to do it. With this change a restored container has the same process context as during checkpointing.

For the container use case the Fedora package container-selinux has introduced a policy to allow CRIU to transition from one process context to another (containers/container-selinux@891a85f containers/container-selinux@a2fc030).

Processes started from the shell are usually running as 'unconfined_t' if no special policy exists and as CRIU will then also be running as 'unconfined_t' it should just work as before. For any process running in another process context additional SELinux policies have to be defined in the future.

@tych0
Copy link
Contributor

tych0 commented Mar 12, 2019

Cool! Is it possible for the container itself to have transitioned to some policy inside? What happens in that case here, does that person have to modify their policy inside the container to support checkpoint restore?

@adrianreber
Copy link
Member Author

Cool! Is it possible for the container itself to have transitioned to some policy inside? What happens in that case here, does that person have to modify their policy inside the container to support checkpoint restore?

Can you describe in more detail what you mean? I do not think I understand what you are asking.

@tych0
Copy link
Contributor

tych0 commented Mar 12, 2019 via email

@adrianreber
Copy link
Member Author

On Tue, Mar 12, 2019 at 03:18:37PM -0700, Adrian Reber wrote:
Cool! Is it possible for the container itself to have transitioned to some policy inside? What happens in that case here, does that person have to modify their policy inside the container to support checkpoint restore? Can you describe in more detail what you mean? I do not think I understand what you are asking.

Maybe what I'm asking isn't possible without selinux namespacing now that I think about it. What I was wondering is, if inside the container, an ngnix process installs its own selinux policy and switches to the ngnix_t context, will CRIU have a problem restoring that, i.e. will the external policy have to be modified to talk about this policy inside the container?

Thanks for the explanation. Not sure. From what I have seen the processes in the container are always using the policy set from the outside. But I actually do not know.

If a container could bring its own policy the socket labeling should still work. The process labeling on restore would fail and require a special policy, because, as far as I understand it, the container-selinux package only allows a transition from 'container_runtime_t' (this is what CRIU would be running as) to 'container_t' (what the container is running as). If the container is running as anything else CRIU would not be allowed to transition the container process to that policy.

The whole CRIU SELinux integration heavily relies on having the correct policies available.

@rst0git
Copy link
Member

rst0git commented Mar 13, 2019

Maybe what I'm asking isn't possible without selinux namespacing now that I think about it.

Based on this, SELinux will always appear to be "disabled" in a container because the SELinux file system is mounted as read-only, thus processes inside containers must not be able set their own separate SELinux policies.

@tych0
Copy link
Contributor

tych0 commented Mar 13, 2019 via email

@rst0git
Copy link
Member

rst0git commented Mar 14, 2019

@adrianreber I was wondering if we could create test case(s) to confirm that the SELinux policy is restored correctly?

@adrianreber
Copy link
Member Author

@adrianreber I was wondering if we could create test case(s) to confirm that the SELinux policy is restored correctly?

In theory it is possible and easy. Run a test case, set a context, checkpoint that process, restore that process, compare the process label. But, as far as I know, travis does not support SELinux, so the test case will not run. The bigger problem is, that it is not easy for the test case to switch to a meaningful SELinux context. So the process would be running as 'unconfined_t' before and after restore. But as that is the default context, nothing would be really tested.

Initially I wanted to add a test case, but it seemed so useless, as it cannot test anything without a special test policy. I can still add it if people think it would make sense.

@avagin
Copy link
Member

avagin commented Mar 19, 2019

In theory it is possible and easy. Run a test case, set a context, checkpoint that process, restore that process, compare the process label.

But, as far as I know, travis does not support SELinux, so the test case will not run.

We have one Fedora host in Jenkins:
https://ci.openvz.org/computer/%20Compulab%20Fitle%20(x86_64)/

The bigger problem is, that it is not easy for the test case to switch to a meaningful SELinux context. So the process would be running as 'unconfined_t' before and after restore. But as that is the default context, nothing would be really tested.

I don't understand this. Is it hard to create a test context? We run tests in Jenkins with root privileges.

Initially I wanted to add a test case, but it seemed so useless, as it cannot test anything without a special test policy. I can still add it if people think it would make sense.

Pls, add this special test policy together with a test. Our past experience showed many times that if something isn't tested, it doesn't work and will be broken soon.

criu/net.c Outdated Show resolved Hide resolved
criu/net.c Outdated Show resolved Hide resolved
criu/net.c Outdated Show resolved Hide resolved
@adrianreber
Copy link
Member Author

Added a test case. Let's see what travis thinks about this. I will fix the other review points later today.

criu/net.c Outdated Show resolved Hide resolved
There was support for SELinux process labels in CRIU but because it was
never tested or verified CRIU only supported the 'unconfined_t' process
label. This was basically no SELinux support.

For successful container checkpoint and restore on a SELinux enabled
host it is necessary that the restored container has the same process
context as before checkpointing.

This commit only removes the check if the label is 'unconfined_t' and
now stores any process label to be restored.

For 'normal' processes started from the command-line which are usually
running in the 'unconfined_t' this just works.

For the container use case this needs additional policies. The latest
container-selinux package on Fedora has the necessary policy to allow
CRIU (running as 'container_runtime_t' when used from Podman) to
transition the restored process to 'container_t'.

Restoring a process running under systemd's control (which means
'unconfined_service_t' without additional policies) will fail because
CRIU will be not allowed to change the context of the restored process.

For each additional CRIU use case on SELinux enabled systems, besides
container processes and command-line/shell processes, additional SELinux
policies are required to allow CRIU to do a 'dyntransition' (change the

Signed-off-by: Adrian Reber <areber@redhat.com>
If running on a system with SELinux enabled the socket for the
communication between parasite daemon and the main CRIU process needs to
be correctly labeled.

Initially this was motivated by Podman's use case: The container is
usually running as something like '...:...:container_t:...:....' and
CRIU started from runc and Podman will run as
'...:...:container_runtime_t:...:...'. As the parasite will be running
with the same context as the container process: 'container_t'.

Allowing a container process to connect via socket to the outside
of the container ('container_runtime_t') is not desired and therefore
CRIU needs to label the socket with the context of the
container: 'container_t'.

So this first gets the context of the root container process and tells
SELinux to label the next created socket with the same label as the root
container process. For this to work it is necessary to have the correct
SELinux policies installed. For Fedora based systems this is part of the
container-selinux package.

This assumes that all processes CRIU wants to dump are labeled with the
same SELinux context. If some of the child processes have different
labels this will not work and needs additional SELinux policies. But the
whole SELinux socket labeling relies on the correct SELinux being
available.

Signed-off-by: Adrian Reber <areber@redhat.com>
This tests if CRIU can restore a process with the same policy as during
checkpointing.

The test selinux00 is started and if SELinux is available the test
process moves itself to another process context. To make this possible
either a new SELinux policy needs to be available containing:

fedora-selinux/selinux-policy@2d537ca

Or for a short time SELinux is switched to permissive mode.

The correct SELinux setup is done by zdtm/static/selinux00.checkskip and
zdtm/static/selinux00.hook and after the test the previous SELinux
policy state is restored.

After the test case is restored the test case checks if it still has the
same SELinux process context as before. If not the test cases fails.

Signed-off-by: Adrian Reber <areber@redhat.com>
@avagin avagin merged commit 26e165e into checkpoint-restore:criu-dev Mar 23, 2019
@adrianreber
Copy link
Member Author

I just discovered that this does not work for non-single-threaded processes:

		/* Only allow single threaded processes to change context */
		error = -EPERM;
		if (!current_is_single_threaded()) {
			error = security_bounded_transition(&selinux_state,
							    tsec->sid, sid);
			if (error)
				goto abort_change;
		}

https://github.com/torvalds/linux/blob/master/security/selinux/hooks.c#L6268

If CRIU tries to restore the process context of each thread via attr/current I get following errors in CRIU's log:

(00.110868) pie: 13: Error (criu/pie/restorer.c:181): can't write lsm profile -13
(00.110871) pie: 12: Error (criu/pie/restorer.c:181): can't write lsm profile -13
(00.111946) pie: 1: Error (criu/pie/restorer.c:181): can't write lsm profile -1
(00.112044) pie: 14: Error (criu/pie/restorer.c:181): can't write lsm profile -13

And the following errors in the audit.log:

type=SELINUX_ERR msg=audit(1553499614.910:984): op=security_bounded_transition seresult=denied oldcontext=unconfined_u:system_r:container_runtime_t:s0 newcontext=system_u:system_r:container_t:s0:c11,c863
type=SELINUX_ERR msg=audit(1553499614.910:985): op=security_bounded_transition seresult=denied oldcontext=unconfined_u:system_r:container_runtime_t:s0 newcontext=system_u:system_r:container_t:s0:c11,c863
type=SELINUX_ERR msg=audit(1553499614.911:986): op=security_bounded_transition seresult=denied oldcontext=unconfined_u:system_r:container_runtime_t:s0 newcontext=system_u:system_r:container_t:s0:c11,c863
type=SELINUX_ERR msg=audit(1553499614.912:987): op=security_bounded_transition seresult=denied oldcontext=unconfined_u:system_r:container_runtime_t:s0 newcontext=system_u:system_r:container_t:s0:c11,c863

@tych0 do you have any ideas how solve this? Or who could have an idea how solve this?

@rst0git
Copy link
Member

rst0git commented Mar 25, 2019

From the man page of setcon(3):

A multi-threaded application can perform a setcon() prior to creating any child threads, in which case all of the child threads will inherit the new context. However, setcon() will fail if there are any other threads running in the same process.

@adrianreber
Copy link
Member Author

@rst0git Yes, that is my fallback plan. The problem is, that this does not correspond to CRIU's idea to set the credentials as late as possible. I guess I have to try out if this works. If the process context is changed much earlier different things CRIU has to do might no longer work. All of a sudden most of the restore steps will be performed in the target context. But this can be solved by additional policies.

Currently the only policy needed is to allow the process to change its process context. If the process context is set before thread creation we might need much more policies. Thanks for the pointer, I need to try it out.

@adrianreber
Copy link
Member Author

@rst0git Setting the process context before forking works. The necessary code changes also are not too bad, but I get something like 50 new SELinux denials.

The restored process and all its threads are now running in the process context of the to be restored container (which is good) but to restore the process CRIU does a lot of steps (including different mounts) which does not sound like something which a container process should be allowed to do, but which is necessary for CRIU.

For CRIU it would be much better to be able to change the process context as late as possible.

Mar 25 14:26:24 fedora29 audit[18155]: AVC avc:  denied  { setcurrent } for  pid=18155 comm="criu" scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:system_r:container_t:s0:c596,c1015 tclass=process permissive=1
Mar 25 14:26:24 fedora29 audit[18155]: AVC avc:  denied  { dyntransition } for  pid=18155 comm="criu" scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=unconfined_u:system_r:container_runtime_t:s0 tclass=process permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { setopt } for  pid=18156 comm="criu" path=002F637269752D666473746F72652D3564643864 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=unconfined_u:system_r:container_runtime_t:s0 tclass=unix_dgram_socket permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { read } for  pid=18156 comm="criu" path=002F637269752D666473746F72652D3564643864 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=unconfined_u:system_r:container_runtime_t:s0 tclass=unix_dgram_socket permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { read } for  pid=18156 comm="criu" path="/run/netns/cni-e739390c-5e93-8c02-95dc-134e8afae3d7" dev="nsfs" ino=4026532476 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=unconfined_u:system_r:container_runtime_t:s0-s0:c0.c1023 tclass=file permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { append } for  pid=18156 comm="criu" path="/var/lib/containers/storage/overlay-containers/6f73c402c9b97f3cef16ed1758593093f459033da3e2ee8c9f55ea5129a3582a/userdata/restore.log" dev="sda1" ino=12106 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=unconfined_u:object_r:container_var_lib_t:s0 tclass=file permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { write } for  pid=18156 comm="criu" name="tasks" dev="cgroup" ino=3780 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:object_r:cgroup_t:s0 tclass=file permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { create } for  pid=18156 comm="criu" name="crtools-proc.RLys16" scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:object_r:container_var_lib_t:s0 tclass=dir permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { mounton } for  pid=18156 comm="criu" path="/var/lib/containers/storage/overlay-containers/6f73c402c9b97f3cef16ed1758593093f459033da3e2ee8c9f55ea5129a3582a/userdata/crtools-proc.RLys16" dev="sda1" ino=12139 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:object_r:container_var_lib_t:s0 tclass=dir permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { mount } for  pid=18156 comm="criu" name="/" dev="proc" ino=1 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:object_r:proc_t:s0 tclass=filesystem permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { unmount } for  pid=18156 comm="criu" scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:object_r:proc_t:s0 tclass=filesystem permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { rmdir } for  pid=18156 comm="criu" name="crtools-proc.RLys16" dev="sda1" ino=12139 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:object_r:container_var_lib_t:s0 tclass=dir permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { read } for  pid=18156 comm="criu" name="tty-info.img" dev="sda1" ino=12113 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=unconfined_u:object_r:container_var_lib_t:s0 tclass=file permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { open } for  pid=18156 comm="criu" path="/var/lib/containers/storage/overlay-containers/6f73c402c9b97f3cef16ed1758593093f459033da3e2ee8c9f55ea5129a3582a/userdata/checkpoint/tty-info.img" dev="sda1" ino=12113 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=unconfined_u:object_r:container_var_lib_t:s0 tclass=file permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { write } for  pid=18156 comm="criu" name="hostname" dev="proc" ino=14857 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:object_r:sysctl_kernel_t:s0 tclass=file permissive=1
Mar 25 14:26:24 fedora29 audit[18159]: AVC avc:  denied  { read } for  pid=18159 comm="criu" name="xtables.lock" dev="tmpfs" ino=31034 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=unconfined_u:object_r:iptables_var_run_t:s0 tclass=file permissive=1
Mar 25 14:26:24 fedora29 audit[18159]: AVC avc:  denied  { open } for  pid=18159 comm="criu" path="/run/xtables.lock" dev="tmpfs" ino=31034 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=unconfined_u:object_r:iptables_var_run_t:s0 tclass=file permissive=1
Mar 25 14:26:24 fedora29 audit[18159]: AVC avc:  denied  { write } for  pid=18159 comm="criu" path=002F637269752D666473746F72652D3564643864 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=unconfined_u:system_r:container_runtime_t:s0 tclass=unix_dgram_socket permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { mounton } for  pid=18156 comm="criu" path="/tmp/.criu.mntns.IlcUPh" dev="sda1" ino=12137 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=unconfined_u:object_r:container_runtime_tmp_t:s0 tclass=dir permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { mount } for  pid=18156 comm="criu" name="/" dev="tmpfs" ino=385612 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:object_r:tmpfs_t:s0 tclass=filesystem permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { write } for  pid=18156 comm="criu" name="tmp" dev="sda1" ino=56 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:object_r:tmp_t:s0 tclass=dir permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { add_name } for  pid=18156 comm="criu" name="cr-tmpfs.gh52eW" scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:object_r:tmp_t:s0 tclass=dir permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { create } for  pid=18156 comm="criu" name="cr-tmpfs.gh52eW" scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:object_r:tmp_t:s0 tclass=dir permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { mounton } for  pid=18156 comm="criu" path="/tmp/cr-tmpfs.gh52eW" dev="sda1" ino=12139 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:object_r:tmp_t:s0 tclass=dir permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { remount } for  pid=18156 comm="criu" scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:object_r:tmpfs_t:s0 tclass=filesystem permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { mount } for  pid=18156 comm="criu" name="/" dev="sysfs" ino=1 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:object_r:sysfs_t:s0 tclass=filesystem permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { relabelfrom } for  pid=18156 comm="criu" scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:object_r:tmpfs_t:s0 tclass=filesystem permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { relabelto } for  pid=18156 comm="criu" scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:object_r:container_file_t:s0:c596,c1015 tclass=filesystem permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { relabelfrom } for  pid=18156 comm="criu" scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:object_r:container_file_t:s0:c596,c1015 tclass=filesystem permissive=1
Mar 25 14:26:24 fedora29 audit[18160]: AVC avc:  denied  { ioctl } for  pid=18160 comm="tar" path="/var/lib/containers/storage/overlay-containers/6f73c402c9b97f3cef16ed1758593093f459033da3e2ee8c9f55ea5129a3582a/userdata/checkpoint/tmpfs-dev-100.tar.gz.img" dev="sda1" ino=12129 ioctlcmd=0x5401 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=unconfined_u:object_r:container_var_lib_t:s0 tclass=file permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { remount } for  pid=18156 comm="criu" scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:object_r:cgroup_t:s0 tclass=filesystem permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { mounton } for  pid=18156 comm="criu" path="/tmp/.criu.mntns.IlcUPh/12-0000000000/proc" dev="proc" ino=1 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:object_r:proc_t:s0 tclass=dir permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { remount } for  pid=18156 comm="criu" scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:object_r:proc_t:s0 tclass=filesystem permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { mounton } for  pid=18156 comm="criu" path="/tmp/.criu.mntns.IlcUPh/12-0000000000/proc/irq" dev="proc" ino=4026531861 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:object_r:sysctl_irq_t:s0 tclass=dir permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { mounton } for  pid=18156 comm="criu" path="/tmp/.criu.mntns.IlcUPh/12-0000000000/proc/sys" dev="proc" ino=4026531854 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:object_r:sysctl_t:s0 tclass=dir permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { mounton } for  pid=18156 comm="criu" path="/tmp/.criu.mntns.IlcUPh/12-0000000000/proc/sysrq-trigger" dev="proc" ino=4026532098 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:object_r:sysctl_t:s0 tclass=file permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { mounton } for  pid=18156 comm="criu" path="/tmp/.criu.mntns.IlcUPh/12-0000000000/proc/kcore" dev="proc" ino=4026532030 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:object_r:proc_kcore_t:s0 tclass=file permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { remount } for  pid=18156 comm="criu" scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:object_r:device_t:s0 tclass=filesystem permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { mounton } for  pid=18156 comm="criu" path="/tmp/.criu.mntns.IlcUPh/12-0000000000/proc/kcore" dev="devtmpfs" ino=1036 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:object_r:null_device_t:s0 tclass=chr_file permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { mounton } for  pid=18156 comm="criu" path="/tmp/.criu.mntns.IlcUPh/12-0000000000/proc/keys" dev="proc" ino=4026532080 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:object_r:proc_t:s0 tclass=file permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { relabelfrom } for  pid=18156 comm="criu" scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:object_r:devpts_t:s0 tclass=filesystem permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { write } for  pid=18156 comm="criu" path=002F637269752D666473746F72652D3564643864 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=unconfined_u:system_r:container_runtime_t:s0 tclass=unix_dgram_socket permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { remount } for  pid=18156 comm="criu" scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:object_r:fs_t:s0 tclass=filesystem permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { remove_name } for  pid=18156 comm="criu" name="cr-tmpfs.gh52eW" dev="sda1" ino=12139 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:object_r:tmp_t:s0 tclass=dir permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { rmdir } for  pid=18156 comm="criu" name="cr-tmpfs.gh52eW" dev="sda1" ino=12139 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:object_r:tmp_t:s0 tclass=dir permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { mounton } for  pid=18156 comm="criu" path="/tmp" dev="sda1" ino=2 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=system_u:object_r:root_t:s0 tclass=dir permissive=1
Mar 25 14:26:24 fedora29 audit[18175]: AVC avc:  denied  { append } for  pid=18175 comm="iptables-restor" path="/var/lib/containers/storage/overlay-containers/6f73c402c9b97f3cef16ed1758593093f459033da3e2ee8c9f55ea5129a3582a/userdata/restore.log" dev="sda1" ino=12106 scontext=unconfined_u:system_r:iptables_t:s0 tcontext=unconfined_u:object_r:container_var_lib_t:s0 tclass=file permissive=1
Mar 25 14:26:24 fedora29 audit[18175]: AVC avc:  denied  { read write } for  pid=18175 comm="iptables-restor" path="socket:[384390]" dev="sockfs" ino=384390 scontext=unconfined_u:system_r:iptables_t:s0 tcontext=unconfined_u:system_r:container_runtime_t:s0 tclass=unix_stream_socket permissive=1
Mar 25 14:26:24 fedora29 audit[18175]: AVC avc:  denied  { read } for  pid=18175 comm="iptables-restor" path="/var/lib/containers/storage/overlay-containers/6f73c402c9b97f3cef16ed1758593093f459033da3e2ee8c9f55ea5129a3582a/userdata/.criu.cgyard.bJlNEs" dev="tmpfs" ino=384398 scontext=unconfined_u:system_r:iptables_t:s0 tcontext=unconfined_u:object_r:container_runtime_tmpfs_t:s0 tclass=dir permissive=1
Mar 25 14:26:24 fedora29 audit[18175]: AVC avc:  denied  { read } for  pid=18175 comm="iptables-restor" path="/var/lib/containers/storage/overlay-containers/6f73c402c9b97f3cef16ed1758593093f459033da3e2ee8c9f55ea5129a3582a/userdata/checkpoint" dev="sda1" ino=12108 scontext=unconfined_u:system_r:iptables_t:s0 tcontext=unconfined_u:object_r:container_var_lib_t:s0 tclass=dir permissive=1
Mar 25 14:26:24 fedora29 audit[18175]: AVC avc:  denied  { read write } for  pid=18175 comm="iptables-restor" path="socket:[384397]" dev="sockfs" ino=384397 scontext=unconfined_u:system_r:iptables_t:s0 tcontext=unconfined_u:system_r:container_runtime_t:s0 tclass=unix_dgram_socket permissive=1
Mar 25 14:26:24 fedora29 audit[18175]: AVC avc:  denied  { ioctl } for  pid=18175 comm="iptables-restor" path="/proc/18156" dev="proc" ino=385633 scontext=unconfined_u:system_r:iptables_t:s0 tcontext=system_u:system_r:container_t:s0:c596,c1015 tclass=dir permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { setopt } for  pid=18156 comm="criu" path=002F637269752D666473746F72652D3564643864 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=unconfined_u:system_r:container_runtime_t:s0 tclass=unix_dgram_socket permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { read } for  pid=18156 comm="criu" path=002F637269752D666473746F72652D3564643864 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=unconfined_u:system_r:container_runtime_t:s0 tclass=unix_dgram_socket permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { getattr } for  pid=18156 comm="criu" path="/var/lib/containers/storage/overlay-containers/6f73c402c9b97f3cef16ed1758593093f459033da3e2ee8c9f55ea5129a3582a/userdata/checkpoint/pagemap-1.img" dev="sda1" ino=12116 scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=unconfined_u:object_r:container_var_lib_t:s0 tclass=file permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { write } for  pid=18156 comm="criu" scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=unconfined_u:system_r:container_runtime_t:s0 tclass=unix_stream_socket permissive=1
Mar 25 14:26:24 fedora29 audit[18156]: AVC avc:  denied  { read } for  pid=18156 comm="criu" scontext=system_u:system_r:container_t:s0:c596,c1015 tcontext=unconfined_u:system_r:container_runtime_t:s0 tclass=unix_stream_socket permissive=1
Mar 25 14:26:24 fedora29 audit[18183]: AVC avc:  denied  { ioctl } for  pid=18183 comm="iptables-restor" path="/proc/18156" dev="proc" ino=385633 scontext=unconfined_u:system_r:iptables_t:s0 tcontext=system_u:system_r:container_t:s0:c596,c1015 tclass=dir permissive=1

@rst0git
Copy link
Member

rst0git commented Mar 26, 2019

Setting the process context before forking works.

The reason for this dynamic context transition support restriction is:
https://lwn.net/Articles/113916/

Multi-threaded processes are not allowed to use this operation, as it will yield an
inconsistency among the security contexts of the threads sharing the same mm.

The restored process and all its threads are now running in the process context of the to be restored container (which is good) but to restore the process CRIU does a lot of steps (including different mounts) which does not sound like something which a container process should be allowed to do, but which is necessary for CRIU.
For CRIU it would be much better to be able to change the process context as late as possible.

Another option could be (from man page of setcon(3)):

Since Linux 2.6.28, setcon() is permitted for threads within a multi-threaded process
if the new security context is bounded by the old security context, where the bounded
relation is defined through typebounds statements in the policy and guarantees that the
new security context has a subset of the permissions of the old security context. 

https://selinuxproject.org/page/Bounds_Rules
https://selinuxproject.org/page/NB_Apache#Bounds_Overview

@adrianreber
Copy link
Member Author

Just as a status update: I found a better place for setting the context. In the restorer just before creating the threads. This reduces the necessary SELinux policy to a single policy to allow writing the log file. That sounds doable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants