Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete daemonset(s) as part of cluster deletion #4214

Closed
nikimanoledaki opened this issue Sep 15, 2021 · 4 comments · Fixed by #4244
Closed

Delete daemonset(s) as part of cluster deletion #4214

nikimanoledaki opened this issue Sep 15, 2021 · 4 comments · Fixed by #4244

Comments

@nikimanoledaki
Copy link
Contributor

nikimanoledaki commented Sep 15, 2021

Related to #536

Networking-related bits are blocking cluster deletion (see issues labeled area/deletions) so we need to ensure that we're deleting the vpc-cni and other Daemonsets. These may be scheduling pods onto nodes during cluster deletion.

Both Daemonsets and their pods need to be addressed:

  1. Daemonsets pods (such as those of vpc-cni) aren't being evicted from the nodes they're running on with the current setup of the Drain func. It may be as easy as changing the configuration of that func to not ignore Daemonset pods.
  2. It would then be good to verify if/how/when daemonsets are being deleted as part of cluster deletion. If Drain isn't able to evict daemonset pods from nodes, maybe the daemonsets and their pods need to be deleted before the nodegroups.
@aclevername
Copy link
Contributor

aclevername commented Sep 21, 2021

I managed to immediately reproduce this SG deletion issue:

  1. eksctl create cluster --name jk --managed=false
  2. tail the logs for aws-node (both instances as my ng is scaled to 2)
  3. eksctl delete nodegroup fails:
2021-09-21 11:55:16 [ℹ]  waiting for CloudFormation stack "eksctl-jk-nodegroup-ng-46b25ad7"
2021-09-21 11:55:32 [ℹ]  waiting for CloudFormation stack "eksctl-jk-nodegroup-ng-46b25ad7"
2021-09-21 11:55:33 [✖]  unexpected status "DELETE_FAILED" while waiting for CloudFormation stack "eksctl-jk-nodegroup-ng-46b25ad7"
2021-09-21 11:55:33 [ℹ]  fetching stack events in attempt to troubleshoot the root cause of the failure
2021-09-21 11:55:33 [✖]  AWS::CloudFormation::Stack/eksctl-jk-nodegroup-ng-46b25ad7: DELETE_FAILED – "The following resource(s) failed to delete: [SG]. "
2021-09-21 11:55:33 [✖]  AWS::EC2::SecurityGroup/SG: DELETE_FAILED – "resource sg-0da449a5f77d414d8 has a dependent object (Service: AmazonEC2; Status Code: 400; Error Code: DependencyViolation; Request ID: 82b044d8-8a28-4c67-935d-38304504be7c; Proxy: null)"
2021-09-21 11:55:33 [ℹ]  1 error(s) occurred while deleting nodegroup(s)
2021-09-21 11:55:33 [✖]  waiting for CloudFormation stack "eksctl-jk-nodegroup-ng-46b25ad7": ResourceNotReady: failed waiting for successful resource state
Error: failed to delete nodegroup(s)

Looking at the logs for aws-node (identical on both instances):

k -n kube-system logs -f aws-node-vltnn
{"level":"info","ts":"2021-09-21T09:52:51.729Z","caller":"entrypoint.sh","msg":"Install CNI binary.."}
{"level":"info","ts":"2021-09-21T09:52:51.817Z","caller":"entrypoint.sh","msg":"Starting IPAM daemon in the background ... "}
{"level":"info","ts":"2021-09-21T09:52:51.818Z","caller":"entrypoint.sh","msg":"Checking for IPAM connectivity ... "}
{"level":"info","ts":"2021-09-21T09:52:53.922Z","caller":"entrypoint.sh","msg":"Copying config file ... "}
{"level":"info","ts":"2021-09-21T09:52:53.926Z","caller":"entrypoint.sh","msg":"Successfully copied CNI plugin binary and config file."}
{"level":"info","ts":"2021-09-21T09:52:53.926Z","caller":"entrypoint.sh","msg":"Foregrounding IPAM daemon ..."}

Inspecting the existing ENIs there is one relating to the NG left around that caused the SG deletion to fail:



Name | Security group ID | Security group name | VPC ID | Description


eni-093b78ce6260ec285	subnet-0d29e1492d265312d 	vpc-0dd64a0fab2f91b71 	eu-central-1a	eksctl-jk-cluster-ClusterSharedNodeSecurityGroup-1V4BOXDJ7TIZG, eksctl-jk-nodegroup-ng-46b25ad7-SG-1U8QLB4KXF2M7	aws-K8S-i-056b783a2822c25e4

with the following tags:

node.k8s.amazonaws.com/createdAt	2021-09-21T09:51:34Z
node.k8s.amazonaws.com/instance_id	i-056b783a2822c25e4

Definitely created by the vpc-cni (aws-node pod) aws/amazon-vpc-cni-k8s#1447

Deleting eni-093b78ce6260ec285 manually and then re-attempting the nodegroup stack deletion made it succeed

@aclevername
Copy link
Contributor

I cannot consistently reproduce this, which makes me think it must be a race condition with the shutdown of the vpc-cni. Like the issue points out I think draining should allow it to gracefully shutdown, hopefully solving the issue.

@aclevername
Copy link
Contributor

It looks like draining daemonsets isn't possible in the traditional "drain" sense. Draining marks a Nodegroup as unschedulable for pods, but DaemonSets are configured to ignore those taints as they are supposed to be the exception to the rule. "Pods that are part of a DaemonSet tolerate being run on an unschedulable Node. DaemonSets typically provide node-local services that should run on the Node even if it is being drained of workload applications". The idea being that things like kube-proxy that run as a DaemonSet need to be running on the Nodegroup while its draining to ensure they gracefully shut down.

You can manually invoke a drain by either:

  • Specifying on the DaemonSet pod itself "don't schedule onto this node" by adding a taint
  • Delete the DaemonSet manually.

Why we want to do this in the first place

Before we go down either of those two routes it worth taking a step back and remembering why we want to drain the DaemonSets. We had multiple issues with the vpc-cni (the pods are called aws-node) not deleting all of the ENIs #1849 it creates, the theory being that it doesn't shut down gracefully (reinforced by this issue). This might be an issue with other daemonsets that we aren't aware of that the user deploys. For example lots of folks have reported a similar problem when draining nodes that contain the Consul DaemonSet. So this issue might not be limited to vpc-cni.

Why aren't nodegroups gracefully shutting down DaemonSets?

Looking into related issues in https://github.com/kubernetes/kubernetes/ it seems that this is an issue a lot of folks have run into kubernetes/kubernetes#75482. November 2020 there was a PR merged related to this issue, that improved how nodegroups shut down to allow the pods running to gracefully be killed kubernetes/kubernetes#96129. This got shipped in 1.21, and is currently in Beta, and can only enabled via a feature flag GracefulNodeShutdown. Turning on this feature might resolve the issue we are trying to solve, but this doesn't fix it for older clusters, we currently support clusters as old as 1.17.

Proposal:

We delete the vpc-cni DaemonSet (this might be deployed by the EKS vpc-cni Addon, so we might have to delete that in order to delete the DaemonSet) after draining the nodegroup, but before deleting the nodegroup, with the hopes that this results in it gracefully shutting down. Given all of the nodegroups are already drained, there should be no harm.

Going forward we look into enabling the the GracefulNodeShutdown feature flag for new nodegroups in 1.21+ clusters.

@nikimanoledaki
Copy link
Contributor Author

This is excellent! This matches perfectly with the order of things mentioned in #536! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants