-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete daemonset(s) as part of cluster deletion #4214
Comments
I managed to immediately reproduce this SG deletion issue:
Looking at the logs for
Inspecting the existing ENIs there is one relating to the NG left around that caused the SG deletion to fail:
with the following tags:
Definitely created by the vpc-cni (aws-node pod) aws/amazon-vpc-cni-k8s#1447 Deleting |
I cannot consistently reproduce this, which makes me think it must be a race condition with the shutdown of the vpc-cni. Like the issue points out I think draining should allow it to gracefully shutdown, hopefully solving the issue. |
It looks like draining daemonsets isn't possible in the traditional "drain" sense. Draining marks a Nodegroup as unschedulable for pods, but DaemonSets are configured to ignore those taints as they are supposed to be the exception to the rule. "Pods that are part of a DaemonSet tolerate being run on an unschedulable Node. DaemonSets typically provide node-local services that should run on the Node even if it is being drained of workload applications". The idea being that things like You can manually invoke a drain by either:
Why we want to do this in the first placeBefore we go down either of those two routes it worth taking a step back and remembering why we want to drain the DaemonSets. We had multiple issues with the Why aren't nodegroups gracefully shutting down DaemonSets?Looking into related issues in https://github.com/kubernetes/kubernetes/ it seems that this is an issue a lot of folks have run into kubernetes/kubernetes#75482. November 2020 there was a PR merged related to this issue, that improved how nodegroups shut down to allow the pods running to gracefully be killed kubernetes/kubernetes#96129. This got shipped in Proposal:We delete the Going forward we look into enabling the the |
This is excellent! This matches perfectly with the order of things mentioned in #536! 🎉 |
Related to #536
Networking-related bits are blocking cluster deletion (see issues labeled
area/deletions
) so we need to ensure that we're deleting thevpc-cni
and other Daemonsets. These may be scheduling pods onto nodes during cluster deletion.Both Daemonsets and their pods need to be addressed:
vpc-cni
) aren't being evicted from the nodes they're running on with the current setup of theDrain
func. It may be as easy as changing the configuration of that func to not ignore Daemonset pods.Drain
isn't able to evict daemonset pods from nodes, maybe the daemonsets and their pods need to be deleted before the nodegroups.The text was updated successfully, but these errors were encountered: