Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What type of PR is this?
bug
Which issue does this PR fix:
With 1.11.4 we optimized to reduce the number of EC2 calls - #1975.
But this introduced a regression when
PrivateIPAddressLimitExceed
error is returned in a corner case. I.e, If IMDS goes out of sync andaws-node
restarts then IPAMD DS will have the ENI but will be missing IPs since IMDS is out of sync. Reconciler will try allocate IPs but EC2 will returnPrivateIpAddressLimitExceeded
since from EC2 point of view IPs are allocated. WithPrivateIpAddressLimitExceeded
we used to return without an error since we will verify the actual state by calling EC2 to see what addresses have already assigned to this ENI. Pre-1.11.4, IPAMD used to make a call to EC2 to confirm the actual state - https://github.com/aws/amazon-vpc-cni-k8s/blob/v1.11.3/pkg/ipamd/ipamd.go#L946But with 1.11.4+, We returned
nil
-amazon-vpc-cni-k8s/pkg/ipamd/ipamd.go
Lines 936 to 949 in c7bd490
increasedPool = true
.amazon-vpc-cni-k8s/pkg/ipamd/ipamd.go
Line 763 in c7bd490
Then, updateLastNodeIPPoolAction will end up updating the
c.lastNodeIPPoolAction = time.Now()
Reconciler first either increase or decrease the datastore pool -
amazon-vpc-cni-k8s/pkg/ipamd/ipamd.go
Line 650 in c7bd490
amazon-vpc-cni-k8s/pkg/ipamd/ipamd.go
Line 653 in c7bd490
amazon-vpc-cni-k8s/pkg/ipamd/ipamd.go
Line 55 in c7bd490
true
since we updated in the previous step.Ref:
amazon-vpc-cni-k8s/pkg/ipamd/ipamd.go
Line 1276 in c7bd490
So the nodeIPPoolReconcile never executed and IPAMD never recovered even though IMDS recovered which is a regression post 1.11.3.
What does this PR do / Why do we need it:
In this PR, we will revert to old behavior but make EC2 call only when
PrivateIpAddressLimitExceeded
If an issue # is not available please add repro steps and logs from IPAMD/CNI showing the issue:
N/A
Testing done on this change:
<PENDING - will update>
Automation added to e2e:
No, this will need IMDS to go out of sync. As a follow up will create a tracking ticket.
Will this PR introduce any new dependencies?:
N/A
Will this break upgrades or downgrades. Has updating a running cluster been tested?:
N/A
Does this change require updates to the CNI daemonset config files to work?:
N/A
Does this PR introduce any user-facing change?:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.