Drain not being performed for KCP machines with K8s v1.31.x #11138
Labels
area/provider/control-plane-kubeadm
Issues or PRs related to KCP
kind/bug
Categorizes issue or PR as related to a bug.
priority/critical-urgent
Highest priority. Must be actively worked on as someone's top priority right now.
triage/accepted
Indicates an issue or PR is ready to be actively worked on.
Milestone
What steps did you take and what happened?
This issue was detected while triaging E2E failures on #11127
What did you expect to happen?
When KCP deletes a machine (due to remediation or scale down) this is what happens:
However, special consideration applies for KCP machines with K8s v1.31.x / with kubelet talking to the local API server pod (for context, see below in the issue).
When kubelet is talking to the local API server pod, right after step 2 of the sequence above, the entire local control plane on the machine starts failing, and thus also kubelet starts to fail (it cannot react to new data from the local apiserver, because the apiserver is down).
This prevents drain at step 4 to complete properly, because the kubelet on the Node doesn't see the deletionTimstamps added to the Pods.
Why we did not catch this before?
Even if draining was not working well, machine deletion would ultimately complete, and thus our tests were passing.
This is because after some time K8s would start to consider the node unreachable; and when this happens node.kubernetes.io/unreachable:NoExecute taint is applied; one of the side effect of this taint is that pods are deleted immediately (see https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/#concepts).
Also, at a certain point Machine controller would detect the node being unreachable and go through a simplified deletion workflow.
So, node unreachable & simplified deletion workflow were hiding the issue in our CI.
Why we detected this now?
#11127, introduced a sophisticated drain test that surfaced this issue.
More specifically, while checking if a PDB blocks drain (as expected), we identified the issue of Machine deletion going through without actually draining pods.
Cluster API version
>= 1.8.0
Kubernetes version
>= 1.31.0
Anything else you would like to add?
When creating CP machines with K8s v1.31.x KCP is forcing kubeadm to use the ControlPlaneKubeletLocalMode feature gate (see #10947, kubernetes/kubernetes#125582).
With this feature gate on, on kubelet on CP nodes is talking to the local API server pod instead of with the control plane end point (which load balances traffic to all the API server instance).
Talking to the local API server pod is required to prevent K8s v1.31.x kubelet to talk to v1.30.x API servers during upgrades, because this is against the version skew policies, and even if this worked for a long time, it started failing when v1.31.x kubelet started using field selectors for spec.clusterIP, which are available only in API server v1.31.x (see #10947 for the full explanation).
Label(s) to be applied
/kind bug
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.
The text was updated successfully, but these errors were encountered: