Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cgroup (namespace) Delegation #948

Open
sargun opened this issue Jan 27, 2018 · 6 comments
Open

Cgroup (namespace) Delegation #948

sargun opened this issue Jan 27, 2018 · 6 comments

Comments

@sargun
Copy link

sargun commented Jan 27, 2018

Now that cgroups can be delegated, it would be nice to have the ability to specify that in the config: http://lkml.iu.edu/hypermail/linux/kernel/1801.1/01070.html

It'd be nice to stay which cgroups should be delegated, and which shouldn't. Right now, all of /sys/fs/cgroup gets bind mounted.

@cyphar
Copy link
Member

cyphar commented Jan 28, 2018

nsdelegate is a bit of a weird design which doesn't really fit with the "container config" purpose of the runtime-spec. It only applies to cgroupv2 and it applies to all cgroups in the v2 hierarchy.

In addition, the way you set up delegation is that you have to set the nsdelegate option on the root cgroup2 mountpoint -- which is usually set up by the init system (systemd in most cases). So from the perspective of a container runtime, nsdelegate is a pre-configured setup of the system.

However, I do agree that we should have cgroup namespace support. Once we have that then we can do a "real" cgroup2 mount. But as has been discussed in other cgroup2 discussion threads, the key cgroups we need for containers (devices and freezer) are not available in cgroup2 and thus we cannot make the wholesale switch -- which will lead to a lot of issues with management of both trees.

Christian Brauner of the LXC project gave a very good talk on what the precise complications are with a hybrid cgroup setup for container runtimes and why this is quite complicated to do.

@sargun
Copy link
Author

sargun commented Jan 28, 2018

@cyphar devices was merged into master, based on BPF filters. So, that's done. I've talked to Tejun, and it sounds like Freezer is underway, although it will not be the same approach. It seems like all we really use Freezer for (that's critical) is to terminate all processes within the container, no? This could just as easily be done by setting pids.max to 0, and walking the hierarchy, or terminating pid 1 of the pid namespace.

I agree, that mixing them is an absolute mess, and I wouldn't push that strategy at all.

@wking
Copy link
Contributor

wking commented Jan 28, 2018 via email

@sargun
Copy link
Author

sargun commented Jan 29, 2018

@wking That should come in opencontainers/runc#1184

@cyphar
Copy link
Member

cyphar commented Feb 2, 2018

It seems like all we really use Freezer for (that's critical) is to terminate all processes within the container, no? This could just as easily be done by setting pids.max to 0, and walking the hierarchy, or terminating pid 1 of the pid namespace.

We also expose freezer with runc pause. This is only critical if the plan is for users to actually go and use cgroupv2-only right now (underneath Docker for instance -- which would break). We could also do the pids.max thing (though I'm unclear if you'd want this to be a temporary workaround or permanent). The reason we don't do that for cgroupv1 is because pids was added more recently than freezer -- and when I got the pids controller merged we already had the freezer implementation for runc.

I still am not sure about how cgroupv2 will work in practice with subtree_control (should runc be writing to the subtree_control all the way up the cgroup tree -- or only the cgroups we've created? Does that mean that we should fail if a controller wasn't enabled by the admin?). There are several administrative questions that I feel are more complicated with cgroupv2 that should be considered when implementing it.

@vbatts vbatts mentioned this issue Mar 5, 2019
6 tasks
@tianon
Copy link
Member

tianon commented Mar 2, 2022

This was fixed via #1123, right? 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants