Delete daemonset(s) as part of cluster deletion #4214

nikimanoledaki · 2021-09-15T12:20:43Z

Related to #536

Networking-related bits are blocking cluster deletion (see issues labeled area/deletions) so we need to ensure that we're deleting the vpc-cni and other Daemonsets. These may be scheduling pods onto nodes during cluster deletion.

Both Daemonsets and their pods need to be addressed:

Daemonsets pods (such as those of vpc-cni) aren't being evicted from the nodes they're running on with the current setup of the Drain func. It may be as easy as changing the configuration of that func to not ignore Daemonset pods.
It would then be good to verify if/how/when daemonsets are being deleted as part of cluster deletion. If Drain isn't able to evict daemonset pods from nodes, maybe the daemonsets and their pods need to be deleted before the nodegroups.

The text was updated successfully, but these errors were encountered:

aclevername · 2021-09-21T11:01:18Z

I managed to immediately reproduce this SG deletion issue:

eksctl create cluster --name jk --managed=false
tail the logs for aws-node (both instances as my ng is scaled to 2)
eksctl delete nodegroup fails:

2021-09-21 11:55:16 [ℹ]  waiting for CloudFormation stack "eksctl-jk-nodegroup-ng-46b25ad7"
2021-09-21 11:55:32 [ℹ]  waiting for CloudFormation stack "eksctl-jk-nodegroup-ng-46b25ad7"
2021-09-21 11:55:33 [✖]  unexpected status "DELETE_FAILED" while waiting for CloudFormation stack "eksctl-jk-nodegroup-ng-46b25ad7"
2021-09-21 11:55:33 [ℹ]  fetching stack events in attempt to troubleshoot the root cause of the failure
2021-09-21 11:55:33 [✖]  AWS::CloudFormation::Stack/eksctl-jk-nodegroup-ng-46b25ad7: DELETE_FAILED – "The following resource(s) failed to delete: [SG]. "
2021-09-21 11:55:33 [✖]  AWS::EC2::SecurityGroup/SG: DELETE_FAILED – "resource sg-0da449a5f77d414d8 has a dependent object (Service: AmazonEC2; Status Code: 400; Error Code: DependencyViolation; Request ID: 82b044d8-8a28-4c67-935d-38304504be7c; Proxy: null)"
2021-09-21 11:55:33 [ℹ]  1 error(s) occurred while deleting nodegroup(s)
2021-09-21 11:55:33 [✖]  waiting for CloudFormation stack "eksctl-jk-nodegroup-ng-46b25ad7": ResourceNotReady: failed waiting for successful resource state
Error: failed to delete nodegroup(s)

Looking at the logs for aws-node (identical on both instances):

k -n kube-system logs -f aws-node-vltnn
{"level":"info","ts":"2021-09-21T09:52:51.729Z","caller":"entrypoint.sh","msg":"Install CNI binary.."}
{"level":"info","ts":"2021-09-21T09:52:51.817Z","caller":"entrypoint.sh","msg":"Starting IPAM daemon in the background ... "}
{"level":"info","ts":"2021-09-21T09:52:51.818Z","caller":"entrypoint.sh","msg":"Checking for IPAM connectivity ... "}
{"level":"info","ts":"2021-09-21T09:52:53.922Z","caller":"entrypoint.sh","msg":"Copying config file ... "}
{"level":"info","ts":"2021-09-21T09:52:53.926Z","caller":"entrypoint.sh","msg":"Successfully copied CNI plugin binary and config file."}
{"level":"info","ts":"2021-09-21T09:52:53.926Z","caller":"entrypoint.sh","msg":"Foregrounding IPAM daemon ..."}

Inspecting the existing ENIs there is one relating to the NG left around that caused the SG deletion to fail:



Name | Security group ID | Security group name | VPC ID | Description


eni-093b78ce6260ec285	subnet-0d29e1492d265312d 	vpc-0dd64a0fab2f91b71 	eu-central-1a	eksctl-jk-cluster-ClusterSharedNodeSecurityGroup-1V4BOXDJ7TIZG, eksctl-jk-nodegroup-ng-46b25ad7-SG-1U8QLB4KXF2M7	aws-K8S-i-056b783a2822c25e4

with the following tags:

node.k8s.amazonaws.com/createdAt	2021-09-21T09:51:34Z
node.k8s.amazonaws.com/instance_id	i-056b783a2822c25e4

Definitely created by the vpc-cni (aws-node pod) aws/amazon-vpc-cni-k8s#1447

Deleting eni-093b78ce6260ec285 manually and then re-attempting the nodegroup stack deletion made it succeed

aclevername · 2021-09-21T13:12:17Z

I cannot consistently reproduce this, which makes me think it must be a race condition with the shutdown of the vpc-cni. Like the issue points out I think draining should allow it to gracefully shutdown, hopefully solving the issue.

aclevername · 2021-09-22T15:48:16Z

It looks like draining daemonsets isn't possible in the traditional "drain" sense. Draining marks a Nodegroup as unschedulable for pods, but DaemonSets are configured to ignore those taints as they are supposed to be the exception to the rule. "Pods that are part of a DaemonSet tolerate being run on an unschedulable Node. DaemonSets typically provide node-local services that should run on the Node even if it is being drained of workload applications". The idea being that things like kube-proxy that run as a DaemonSet need to be running on the Nodegroup while its draining to ensure they gracefully shut down.

You can manually invoke a drain by either:

Specifying on the DaemonSet pod itself "don't schedule onto this node" by adding a taint
Delete the DaemonSet manually.

Why we want to do this in the first place

Before we go down either of those two routes it worth taking a step back and remembering why we want to drain the DaemonSets. We had multiple issues with the vpc-cni (the pods are called aws-node) not deleting all of the ENIs #1849 it creates, the theory being that it doesn't shut down gracefully (reinforced by this issue). This might be an issue with other daemonsets that we aren't aware of that the user deploys. For example lots of folks have reported a similar problem when draining nodes that contain the Consul DaemonSet. So this issue might not be limited to vpc-cni.

Why aren't nodegroups gracefully shutting down DaemonSets?

Looking into related issues in https://github.com/kubernetes/kubernetes/ it seems that this is an issue a lot of folks have run into kubernetes/kubernetes#75482. November 2020 there was a PR merged related to this issue, that improved how nodegroups shut down to allow the pods running to gracefully be killed kubernetes/kubernetes#96129. This got shipped in 1.21, and is currently in Beta, and can only enabled via a feature flag GracefulNodeShutdown. Turning on this feature might resolve the issue we are trying to solve, but this doesn't fix it for older clusters, we currently support clusters as old as 1.17.

Proposal:

We delete the vpc-cni DaemonSet (this might be deployed by the EKS vpc-cni Addon, so we might have to delete that in order to delete the DaemonSet) after draining the nodegroup, but before deleting the nodegroup, with the hopes that this results in it gracefully shutting down. Given all of the nodegroups are already drained, there should be no harm.

Going forward we look into enabling the the GracefulNodeShutdown feature flag for new nodegroups in 1.21+ clusters.

nikimanoledaki · 2021-09-22T16:00:13Z

This is excellent! This matches perfectly with the order of things mentioned in #536! 🎉

nikimanoledaki added area/deletions needs-investigation labels Sep 15, 2021

aclevername self-assigned this Sep 21, 2021

nikimanoledaki mentioned this issue Sep 22, 2021

nodegroup deletion is blocked by SG/ENIs #1849

Closed

aclevername mentioned this issue Sep 22, 2021

Dangling ENIs without any association with Instances aws/amazon-vpc-cni-k8s#1447

Closed

aclevername mentioned this issue Sep 23, 2021

Gracefully delete vpc-cni during cluster deletion #4244

Merged

7 tasks

aclevername closed this as completed in #4244 Sep 23, 2021

aclevername mentioned this issue Oct 20, 2021

Add support for ASG lifecycle hooks #4347

Closed

nikimanoledaki mentioned this issue Apr 20, 2022

[Bug] Cluster security group randomly not getting deleted #5135

Closed

rquitales mentioned this issue Jul 2, 2024

Integration Test Flake: DependencyViolation NodeSecurityGroup has a dependent object pulumi/pulumi-eks#1222

Closed

t0yv0 mentioned this issue Jul 3, 2024

Deleting an eks.Cluster may fail with DependencyViolation on nodeSecurityGroup pulumi/pulumi-eks#1226

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delete daemonset(s) as part of cluster deletion #4214

Delete daemonset(s) as part of cluster deletion #4214

nikimanoledaki commented Sep 15, 2021 •

edited

Loading

aclevername commented Sep 21, 2021 •

edited

Loading

aclevername commented Sep 21, 2021

aclevername commented Sep 22, 2021

nikimanoledaki commented Sep 22, 2021

Delete daemonset(s) as part of cluster deletion #4214

Delete daemonset(s) as part of cluster deletion #4214

Comments

nikimanoledaki commented Sep 15, 2021 • edited Loading

aclevername commented Sep 21, 2021 • edited Loading

aclevername commented Sep 21, 2021

aclevername commented Sep 22, 2021

Why we want to do this in the first place

Why aren't nodegroups gracefully shutting down DaemonSets?

Proposal:

nikimanoledaki commented Sep 22, 2021

nikimanoledaki commented Sep 15, 2021 •

edited

Loading

aclevername commented Sep 21, 2021 •

edited

Loading