[CNI]: Teardown pod network when IPAMD connection fails #2145

jdn5126 · 2022-11-17T23:08:09Z

What type of PR is this?
bug

Which issue does this PR fix:
#2048

What does this PR do / Why do we need it:
Note that this PR replaces #2125

This PR resolves an issue in which IP rules were leaked by the CNI. When processing a pod deletion, the CNI would wait for IPAMD response before tearing down pod networking resources. If IPAMD could not be reached, CNI would return error and wait for kubelet to retry the delete. If IPAMD were restarted, the state for this pod would be cleared without CNI tearing down the associated networking resources. The trigger for the linked issue was the k8s cluster autoscaler evicting the aws-node daemonset pod before other pods and then later cancelling the pod evictions. kubernetes/autoscaler#5240 was filed to ask for k8s cluster autoscaler to change its behavior.

The changes in this PR are two-fold:

non-branch ENI pods now store state in PrevResult for later cleanup
when IPAMD connection fails (two cases), we try to cleanup pod network using PrevResult

There is a lot of duplication between teardownPodNetworkWithPrevResult and tryDelWithPrevResult. I kept them separate to avoid unnecessarily complicating tryDelWithPrevResult and to make it clear that teardownPodNetworkWithPrevResult is a fallback mechanism.

If an issue # is not available please add repro steps and logs from IPAMD/CNI showing the issue:

Testing done on this change:
Added more test cases to cni_test.go and verified that all CNI and IPAMD integration tests pass with this change. Also manually verified the fix with IPv4 and IPv6 clusters.

Automation added to e2e:
N/A

Will this PR introduce any new dependencies?:
No

Will this break upgrades or downgrades. Has updating a running cluster been tested?:
This will not break upgrades or downgraded. A running cluster has been tested.

Does this change require updates to the CNI daemonset config files to work?:
No

Does this PR introduce any user-facing change?:
No

Cleanup pod networking resources when IPAMD is unreachable to prevent rule leaking.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

M00nF1sh

/lgtm

…sult

* create publisher with logger (#2119) * Add missing rules when NodePort support is disabled (#2026) * Add missing rules when NodePort support is disabled * the rules that need to be installed for NodePort support and SNAT support are very similar. The same traffic mark is needed for both. As a result, rules that are currently installed only when NodePort support is enabled should also be installed when external SNAT is disabled, which is the case by default. * remove "-m state --state NEW" from a rule in the nat table. This is always true for packets that traverse the nat table. * fix typo in one rule's name (extra whitespace). Fixes #2025 Co-authored-by: Quan Tian <qtian@vmware.com> Signed-off-by: Antonin Bas <abas@vmware.com> * Fix typos and unit tests Signed-off-by: Antonin Bas <abas@vmware.com> * Minor improvement to code comment Signed-off-by: Antonin Bas <abas@vmware.com> * Address review comments * Delete legacy nat rule * Fix an unrelated log message Signed-off-by: Antonin Bas <abas@vmware.com> Signed-off-by: Antonin Bas <abas@vmware.com> Co-authored-by: Jayanth Varavani <1111446+jayanthvn@users.noreply.github.com> Co-authored-by: Sushmitha Ravikumar <58063229+sushrk@users.noreply.github.com> * downgrade test go.mod to align with root go.mod (#2128) * skip addon installation when addon info is not available (#2131) * Merging test/Makefile and test/go.mod to the root Makefil and go.mod, adjust the .github/workflows and integration test instructions (#2129) * update troubleshooting docs for CNI image (#2132) fix location where make command is run * fix env name in test script (#2136) * optionally allow CLUSTER_ENDPOINT to be used rather than the cluster-ip (#2138) * optionally allow CLUSTER_ENDPOINT to be used rather than the kubernetes cluster ip * remove check for kube-proxy * add version to readme * Add resources config option to cni metrics helper (#2141) * Add resources config option to cni metrics helper * Remove default-empty resources block; replace with conditional * Add metrics for ec2 api calls made by CNI and expose via prometheus (#2142) Co-authored-by: Jay Deokar <jsdeokar@amazon.com> * increase workflow role duration to 4 hours (#2148) * Update golang 1.19.2 EKS-D (#2147) * Update golang * Move to EKS distro builds * [HELM]: Move CRD resources to a separate folder as per helm standard (#2144) Co-authored-by: Jay Deokar <jsdeokar@amazon.com> * VPC-CNI minimal image builds (#2146) * VPC-CNI minimal image builds * update dependencies for ginkgo when running integration tests * address review comments and break up init main function * review comments for sysctl * Simplify binary installation, fix review comments Since init container is required to always run, let binary installation for external plugins happen in init container. This simplifies the main container entrypoint and the dockerfile for each image. * when IPAMD connection fails, try to teardown pod network using prevResult (#2145) * add env var to enable nftables (#2155) * fix failing weekly cron tests (#2154) * Deprecate AWS_VPC_K8S_CNI_CONFIGURE_RPFILTER and remove no-op setter (#2153) * Deprecate AWS_VPC_K8S_CNI_CONFIGURE_RPFILTER * update release version comments Signed-off-by: Antonin Bas <abas@vmware.com> Co-authored-by: Jeffrey Nelson <jdnelson@amazon.com> Co-authored-by: Antonin Bas <antonin.bas@gmail.com> Co-authored-by: Jayanth Varavani <1111446+jayanthvn@users.noreply.github.com> Co-authored-by: Sushmitha Ravikumar <58063229+sushrk@users.noreply.github.com> Co-authored-by: Jerry He <37866862+jerryhe1999@users.noreply.github.com> Co-authored-by: Brandon Wagner <wagnerbm@amazon.com> Co-authored-by: Jonathan Ogilvie <679297+jcogilvie@users.noreply.github.com> Co-authored-by: Jay Deokar <jsdeokar@amazon.com>

…sult (aws#2145)

jdn5126 requested a review from a team as a code owner November 17, 2022 23:08

jayanthvn added this to the v1.12.1 milestone Nov 23, 2022

jdn5126 force-pushed the ip_rule_leak_fallback branch from 6bacc9b to 968645c Compare November 29, 2022 23:33

M00nF1sh previously approved these changes Dec 8, 2022

View reviewed changes

jdn5126 dismissed M00nF1sh’s stale review via b46a510 December 8, 2022 18:03

jdn5126 force-pushed the ip_rule_leak_fallback branch from 968645c to b46a510 Compare December 8, 2022 18:03

when IPAMD connection fails, try to teardown pod network using prevRe…

f452396

…sult

jdn5126 force-pushed the ip_rule_leak_fallback branch from b46a510 to f452396 Compare December 8, 2022 18:05

M00nF1sh self-requested a review December 8, 2022 18:07

M00nF1sh approved these changes Dec 8, 2022

View reviewed changes

jdn5126 merged commit 320153f into aws:master Dec 8, 2022

jdn5126 deleted the ip_rule_leak_fallback branch December 8, 2022 19:48

haouc pushed a commit to haouc/amazon-vpc-cni-k8s that referenced this pull request Dec 13, 2022

when IPAMD connection fails, try to teardown pod network using prevRe…

f22ac63

…sult (aws#2145)

gbucknel mentioned this pull request Mar 8, 2023

Empty cache files causes "KillPodSandbox" errors when deleting pods containerd/containerd#8197

Open

jdn5126 mentioned this pull request Mar 17, 2023

Non-SGPP pods created in v1.12.1+ cannot be deleted in v1.11.4 #2321

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CNI]: Teardown pod network when IPAMD connection fails #2145

[CNI]: Teardown pod network when IPAMD connection fails #2145

jdn5126 commented Nov 17, 2022

M00nF1sh left a comment

[CNI]: Teardown pod network when IPAMD connection fails #2145

[CNI]: Teardown pod network when IPAMD connection fails #2145

Conversation

jdn5126 commented Nov 17, 2022

M00nF1sh left a comment

Choose a reason for hiding this comment