Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return success from CNI DEL when IPAMD is unreachable #2350

Merged
merged 1 commit into from
May 8, 2023

Conversation

jdn5126
Copy link
Contributor

@jdn5126 jdn5126 commented Apr 11, 2023

What type of PR is this?
bug

Which issue does this PR fix:
#2331

What does this PR do / Why do we need it:
This PR modifies the VPC CNI pod deletion logic to return no error to the container-runtime when IPAMD is unreachable. This is needed for three primary reasons:

  1. The CNI spec specifies that delete operations should generally complete without error, even when resources are missing.
  2. The CNI cannot count on kubelet to retry deletes until IPAMD is available. If a user deletes a pod with a termination grace period of 0 seconds, the pod is deleted from the API server before kubelet returns success, so kubelet will give up after a few iterations.
  3. If the aws-node daemonset pod cannot be scheduled until other pods are deleted, pod deletion needs to happen without IPAMD.

CNI Changes:
As before, the CNI will try to delete non-branch-ENI pods using PrevResult when IPAMD is not reachable. The difference is that CNI will now return no error in this case. Note that PrevResult will only be available for pods created with VPC CNI v1.12.1+. For pods created before this, IP rules will be leaked, and it is IPAMD's responsibility to cleanup these rules when it starts again.

IPAMD Changes:
On init, IPAMD identifies stale allocations and prunes IP rules for these stale allocations. IPAMD cannot be sure if CNI was able to prune those rules before the pod was deleted. Also, IPAMD does an additional state file write after pruning entries.

Misc:
I also did some refactoring of network utilities to avoid code duplication.

If an issue # is not available please add repro steps and logs from IPAMD/CNI showing the issue:
N/A

Testing done on this change:
Added unit test coverage for stale IPAM allocations. Manually created a cluster and verified behavior when IPAMD is unreachable, plus upgrade/downgrade scenarios.

Automation added to e2e:
N/A

Will this PR introduce any new dependencies?:
N/A

Will this break upgrades or downgrades. Has updating a running cluster been tested?:
No, Yes

Does this change require updates to the CNI daemonset config files to work?:
No

Does this PR introduce any user-facing change?:
Yes

Update CNI to delete pod when IPAMD is unreachable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@jdn5126 jdn5126 requested a review from a team as a code owner April 11, 2023 19:43
for _, allocation := range checkpoint.Allocations {
if err := ds.validateAllocationByPodVethExistence(allocation, hostNSLinks); err != nil {
ds.log.Warnf("ignore IP allocation for %v:%v,%v due to %v", allocation.ContainerID, allocation.IPv4, allocation.IPv6, err)
ds.log.Warnf("stale IP allocation for ID(%v): IPv4(%v), IPv6(%v) due to %v", allocation.ContainerID, allocation.IPv4, allocation.IPv6, err)
staleAllocations = append(staleAllocations, allocation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we cleanup resources and then call IPAMD then this check wouldn't be needed..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but we can only do that if the pod was created by v1.12.1+. It would be great to remove this overhead in the future, but in the common case, there should never be any stale allocations to clean up

@jdn5126 jdn5126 force-pushed the cni_cleanup branch 4 times, most recently from 38f992d to 71df0fd Compare April 26, 2023 15:13
@jdn5126 jdn5126 force-pushed the cni_cleanup branch 3 times, most recently from 508364a to 8d448c7 Compare May 3, 2023 22:13
On IPAMD startup, prune IP rules for stale allocations
@M00nF1sh M00nF1sh self-requested a review May 8, 2023 21:01
Copy link
Contributor

@M00nF1sh M00nF1sh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
approving given we'll do the mentioned refactor as follow up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants