Support teardown of threaded cgroups #3894
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a patch introduced by @amurzeau over in #3821. It is intended less as a fix ready for merge, and more of a point to start discussion with the
runc
developers about how I could go about addressing this issue.For some context, cgroupv2 introduced a unified hierarchy, and the concept of a threading mode for managing control groups on a per-thread basis, instead of solely per-process. If you attempt to take advantage of that feature today though, you inadvertently create a container that can never be torn down by
runc
:When this happens the controlling pod cannot be cleaned up by
kubelet
, and the node's resources remain used until the host is reset to purge the wedged container and associated cgroup(s).This error occurs because when a cgroup is moved out of "domain" (read: process, effectively behaving as you'd expect coming from cgroupv1) and into "threaded" mode (as indicated by the content of
cgroup.type
) the filecgroup.procs
becomes unreadable in favor ofcgroup.threads
. The patch included here addresses that by checking for the error raised in that case (ENOTSUP
) and swappingcgroup.threads
in forcgroup.procs
. This works, but isn't likely totally expected upstream inrunc
's callstack.Interested in guidance as to how I'd go about addressing this change in behavior, and any other changes the maintainers would like to see before accepting a fix for this issue upstream, thanks in advance!