-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix runc kill
and runc delete
for containers with no init and no private PID namespace
#4102
Commits on Nov 27, 2023
-
libct: replace runType with hasInit
The semantics of runType is slightly complicated, and the only place where we need to distinguish between Created and Running is refreshState. Replace runType with simpler hasInit, simplifying its users (except the refreshState, which now figures out on its own whether the container is Created or Running). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for d9f2a24 - Browse repository at this point
Copy the full SHA d9f2a24View commit details -
libct: Signal: slight refactor
Let's use c.hasInit and c.isPaused where needed instead of c.curentStatus for simplicity. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 542cce0 - Browse repository at this point
Copy the full SHA 542cce0View commit details -
runc kill: fix sending KILL to non-pidns container
Commit f8ad20f made it impossible to kill leftover processes in a stopped container that does not have its own PID namespace. In other words, if a container init is gone, it is no longer possible to use `runc kill` to kill the leftover processes. Fix this by moving the check if container init exists to after the special case of handling the container without own PID namespace. While at it, fix the minor issue introduced by commit 9583b3d: if signalAllProcesses is used, there is no need to thaw the container (as freeze/thaw is either done in signalAllProcesses already, or not needed at all). Also, make signalAllProcesses return an error early if the container cgroup does not exist (as it relies on it to do its job). This way, the error message returned is more generic and easier to understand ("container not running" instead of "can't open file"). Finally, add a test case. Fixes: f8ad20f Fixes: 9583b3d Co-authored-by: lifubang <lifubang@acmcoder.com> Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for dcf1b73 - Browse repository at this point
Copy the full SHA dcf1b73View commit details -
runc delete -f: fix for no pidns + no init case
Commit f8ad20f moved the kill logic from container destroy to container kill (which is the right thing to do). Alas, it broke the use case of doing "runc delete -f" for a container which does not have its own private PID namespace, when its init process is gone. In this case, some processes may still be running, and runc delete -f should kill them (the same way as "runc kill" does). It does not do that because the container status is "stopped" (as runc considers the container with no init process as stopped), and so we only call "destroy" (which was doing the killing before). The fix is easy: if --force is set, call killContainer no matter what. Add a test case, similar to the one in the previous commit. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 29283bb - Browse repository at this point
Copy the full SHA 29283bbView commit details -
libct/cg: improve cgroup removal logic
The current code is only doing retries in RemovePaths, which is only used for cgroup v1 (cgroup v2 uses RemovePath, which makes no retries). Let's remove all retry logic and logging from RemovePaths, together with: - os.Stat check from RemovePaths (its usage probably made sense before commit 19be8e5 but not after); - error/warning logging from RemovePaths (this was added by commit 19be8e5 in 2020 and so far we've seen no errors other than EBUSY, so reporting the actual error proved to be useless). Add the retry logic to rmdir, and the second retry bool argument. Decrease the initial delay and increase the number of retries from the old implementation so it can take up to ~1 sec before returning EBUSY (was about 0.3 sec). Hopefully, as a result, we'll have less "failed to remove cgroup paths" errors. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for d3d7f7d - Browse repository at this point
Copy the full SHA d3d7f7dView commit details -
runc delete: do not ignore error from destroy
If container.Destroy() has failed, runc destroy still return 0, which is wrong and can result in other issues down the line. Let's always return error from destroy in runc delete. For runc checkpoint and runc run, we still treat it as a warning. Co-authored-by: Zhang Tianyang <burning9699@gmail.com> Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 7396ca9 - Browse repository at this point
Copy the full SHA 7396ca9View commit details -
runc delete, container.Destroy: kill all processes
(For a container with no private PID namespace, that is). When runc delete (or container.Destroy) is called on a stopped container without private PID namespace and there are processes in its cgroup, kill those. Add a test case. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for ab3cd8d - Browse repository at this point
Copy the full SHA ab3cd8dView commit details -
libct: Destroy: don't proceed in case of errors
For some reason, container destroy operation removes container's state directory even if cgroup removal fails (and then still returns an error). It has been that way since commit 5c246d0, which added cgroup removal. This is problematic because once the container state dir is removed, we no longer know container's cgroup and thus can't remove it. Let's return the error early and fail if cgroup can't be removed. Same for other operations: do not proceed if we fail. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for a6f4081 - Browse repository at this point
Copy the full SHA a6f4081View commit details