Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws-cni pod can get stuck in Pending state #2389

Closed
mattburgess opened this issue May 22, 2023 · 3 comments
Closed

aws-cni pod can get stuck in Pending state #2389

mattburgess opened this issue May 22, 2023 · 3 comments

Comments

@mattburgess
Copy link

What happened:

Environment:

  • Kubernetes version (use kubectl version): 1.22.17
  • CNI Version: 1.12.6
  • OS (e.g: cat /etc/os-release): Ubuntu-18.04 and Ubuntu-22.04
  • Kernel (e.g. uname -a): 5.19.0-1022-aws

We've encountered an issue whereby pods and nodes can become unhealthy during an aws-node daemonset rollout. We use the MostAllocated scheduler strategy to pack pods as tightly as possible which means that some nodes can see CPU requests around 98%-99%. During an aws-node daemonset rollout what appears to happen is that the old pod is deleted but then the scheduler can't bring a new pod up because there's no CPU resource left and the pods that are being evicted can't be because they can't call into the aws-cni pod, e.g.:

May 22 14:07:40 ip-172-x-x-x kubelet[4173]: I0522 14:07:40.557892    4173 kubelet.go:2120] "SyncLoop DELETE" source="api" pods=[kube-system/aws-cni-cmvm4]
May 22 14:07:40 ip-172-x-x-x kubelet[4173]: I0522 14:07:40.563143    4173 kubelet.go:2114] "SyncLoop REMOVE" source="api" pods=[kube-system/aws-cni-cmvm4]
...
May 22 14:07:40 ip-172-x-x-x kubelet[4173]: I0522 14:07:40.602558    4173 kubelet.go:2120] "SyncLoop DELETE" source="api" pods=[kube-system/atest]
May 22 14:07:41 ip-172-x-x-x kubelet[4173]: E0522 14:07:41.095792    4173 cni.go:380] "Error deleting pod from network" err="del cmd: error received from DelNetwork gRPC call: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:50051: connect: connection refused\"" pod="kube-sys>
May 22 14:07:41 ip-172-x-x-x kubelet[4173]: E0522 14:07:41.137866    4173 remote_runtime.go:116] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = [failed to set up sandbox container \"c199e0a128af420f2a4acd72ea5c58567f6e642cbf44a9477192f97fb753cc7c\" network for pod \"atest\": networkPlugin cni failed to se>
May 22 14:07:41 ip-172-x-x-x kubelet[4173]: E0522 14:07:41.137910    4173 kuberuntime_sandbox.go:70] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = [failed to set up sandbox container \"c199e0a128af420f2a4acd72ea5c58567f6e642cbf44a9477192f97fb753cc7c\" network for pod \"atest\": networkPlugin cni failed to set up >
May 22 14:07:41 ip-172-x-x-x kubelet[4173]: I0522 14:07:41.262756    4173 docker_sandbox.go:401] "Failed to read pod IP from plugin/docker" err="networkPlugin cni failed on the status hook for pod \"atest_kube-system\": CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container \"c199e0a128af4>

We think we have pod priorities set correctly, as per this portion of our daemonset spec:

  preemptionPolicy: PreemptLowerPriority
  priority: 2000001000
  priorityClassName: system-node-critical
  schedulerName: default-scheduler

It feels like we must have misconfigured something else though as it surely should be possible to avoid this scenario?

@jdn5126
Copy link
Contributor

jdn5126 commented May 22, 2023

@mattburgess this sounds like #2331, which was fixed by #2350 and will ship in the next VPC CNI release, which is planned for the end of this month, give or take a week.

The TL;DR is that since aws-node is system-node-critical, other pods will be evicted to make room for it, but pods cannot be evicted unless IPAMD is running, and it runs in the aws-node pod. So there is a chicken-and-egg problem that we had to resolve. The workaround is to not specify any requests for the aws-node pod, as then it will get scheduled regardless of how much CPU or MEM is available on the node.

@mattburgess
Copy link
Author

@jdn5126 thanks for the ridiculously quick response, and apologies for the delay in getting back to you. That makes perfect sense to me, really pleased there's a fix already in the works. Happy to close as a dupe of #2331

@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants