You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OS (e.g: cat /etc/os-release): Ubuntu-18.04 and Ubuntu-22.04
Kernel (e.g. uname -a): 5.19.0-1022-aws
We've encountered an issue whereby pods and nodes can become unhealthy during an aws-node daemonset rollout. We use the MostAllocated scheduler strategy to pack pods as tightly as possible which means that some nodes can see CPU requests around 98%-99%. During an aws-node daemonset rollout what appears to happen is that the old pod is deleted but then the scheduler can't bring a new pod up because there's no CPU resource left and the pods that are being evicted can't be because they can't call into the aws-cni pod, e.g.:
May 22 14:07:40 ip-172-x-x-x kubelet[4173]: I0522 14:07:40.557892 4173 kubelet.go:2120] "SyncLoop DELETE" source="api" pods=[kube-system/aws-cni-cmvm4]
May 22 14:07:40 ip-172-x-x-x kubelet[4173]: I0522 14:07:40.563143 4173 kubelet.go:2114] "SyncLoop REMOVE" source="api" pods=[kube-system/aws-cni-cmvm4]
...
May 22 14:07:40 ip-172-x-x-x kubelet[4173]: I0522 14:07:40.602558 4173 kubelet.go:2120] "SyncLoop DELETE" source="api" pods=[kube-system/atest]
May 22 14:07:41 ip-172-x-x-x kubelet[4173]: E0522 14:07:41.095792 4173 cni.go:380] "Error deleting pod from network" err="del cmd: error received from DelNetwork gRPC call: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:50051: connect: connection refused\"" pod="kube-sys>
May 22 14:07:41 ip-172-x-x-x kubelet[4173]: E0522 14:07:41.137866 4173 remote_runtime.go:116] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = [failed to set up sandbox container \"c199e0a128af420f2a4acd72ea5c58567f6e642cbf44a9477192f97fb753cc7c\" network for pod \"atest\": networkPlugin cni failed to se>
May 22 14:07:41 ip-172-x-x-x kubelet[4173]: E0522 14:07:41.137910 4173 kuberuntime_sandbox.go:70] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = [failed to set up sandbox container \"c199e0a128af420f2a4acd72ea5c58567f6e642cbf44a9477192f97fb753cc7c\" network for pod \"atest\": networkPlugin cni failed to set up >
May 22 14:07:41 ip-172-x-x-x kubelet[4173]: I0522 14:07:41.262756 4173 docker_sandbox.go:401] "Failed to read pod IP from plugin/docker" err="networkPlugin cni failed on the status hook for pod \"atest_kube-system\": CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container \"c199e0a128af4>
We think we have pod priorities set correctly, as per this portion of our daemonset spec:
@mattburgess this sounds like #2331, which was fixed by #2350 and will ship in the next VPC CNI release, which is planned for the end of this month, give or take a week.
The TL;DR is that since aws-node is system-node-critical, other pods will be evicted to make room for it, but pods cannot be evicted unless IPAMD is running, and it runs in the aws-node pod. So there is a chicken-and-egg problem that we had to resolve. The workaround is to not specify any requests for the aws-node pod, as then it will get scheduled regardless of how much CPU or MEM is available on the node.
@jdn5126 thanks for the ridiculously quick response, and apologies for the delay in getting back to you. That makes perfect sense to me, really pleased there's a fix already in the works. Happy to close as a dupe of #2331
Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.
What happened:
Environment:
kubectl version
): 1.22.17cat /etc/os-release
): Ubuntu-18.04 and Ubuntu-22.04uname -a
): 5.19.0-1022-awsWe've encountered an issue whereby pods and nodes can become unhealthy during an
aws-node
daemonset rollout. We use theMostAllocated
scheduler strategy to pack pods as tightly as possible which means that some nodes can see CPU requests around 98%-99%. During anaws-node
daemonset rollout what appears to happen is that the old pod is deleted but then the scheduler can't bring a new pod up because there's no CPU resource left and the pods that are being evicted can't be because they can't call into the aws-cni pod, e.g.:We think we have pod priorities set correctly, as per this portion of our daemonset spec:
It feels like we must have misconfigured something else though as it surely should be possible to avoid this scenario?
The text was updated successfully, but these errors were encountered: