-
Notifications
You must be signed in to change notification settings - Fork 39.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sandbox pause container not deleted #113073
Comments
/sig node |
Is this dockershim? It's worth noting that dockershim was removed from kubelet in 1.24, and 1.22 is end of life in just 11 days (october 28th) with 1.25 being current and 1.26 on the way: https://kubernetes.io/releases/ (It's possible this bug may need patching in 1.23 though, which is EOL February 23rd, 2023, but few people work on dockershim at this point) |
Yes, this is dockershim, and I believe this exists in 1.23, though I have not been able to reproduce. I was hoping the logs could point to whether the issue has anything to do with dockershim or not. |
IIRC: In CRI kubelet asks for the pod sandbox to be removed (which includes the pause container) and the pause container is no longer a detail of kubelet but a detail of the CRI. |
Yeah, so what I was unable to tell from the logs is if kubelet was requesting the correct pod sandbox to be removed, or if it was requesting the wrong one, leading to the container persisting. |
/triage accepted Even though triage is accepted we may have hard time having somebody to look into dockershim issues. Is it possible to switch to Containerd as a runtime? |
As I understand it, that is the long-term plan for EKS, but only for 1.24+ |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
We are facing same issue with containerd in K8s v1.23.3. |
/close dockershim is out of support in OSS Kubernetes |
@SergeyKanzhelev: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What happened?
A pod deletion did not result in the correct sandbox pause container being deleted. This was found during internal EKS testing on 1.22, where the test case was as follows:
What is unique about this test is that each pod sandbox fails to get an IP address assigned on average ~100 times before succeeding. So kubelet keeps trying to setup a pod sandbox using a new container ID each time. In the failure scenario, when kubelet finally sees the CNI succeed and the pod starts, it appears that kubelet is tracking an old container ID for the sandbox pod.
In the attached logs, the pod in question is
deployment-test-6655f9df8c-jc8gv
, and the sandbox create succeeds with container ID49b6b37d511bcf993e0757afe45f55ee9d1ed2246834e055ba7f32fe7b20878e
, but kubelet appears to be tracking an old container ID,c0fe93feba2615c06cac681f00b81d5e3a45ddab47714fbdb8bfc0d38253b720
.When the pod is deleted, we never see a delete for the successful container ID, only the old container ID, and so the pod is cleaned up while the container remains running.
eks_i-04f3b29e38d6f7a6e_2022-10-13_0346-UTC_0.7.1.tar.gz
What did you expect to happen?
I expect pod deletion to delete all sandbox pods. In this case, I expected a delete for
49b6b37d511bcf993e0757afe45f55ee9d1ed2246834e055ba7f32fe7b20878e
to be issued.How can we reproduce it (as minimally and precisely as possible)?
This is the challenging part. The steps we used to reproduce this with a 1/3 success rate were:
Anything else we need to know?
So far, we have only been able to reproduce this on 1.22. We were able to reproduce with kubelet logging verbosity set to 10, but enabling any containerd debug logging resulted in this not being reproducible. From the resulting logs, we were unable to confirm whether this was a kubelet or containerd issue.
Also, this issue does seem similar to #110181. .
Kubernetes version
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: