Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle private IP exceeded error #2210

Merged
merged 3 commits into from
Jan 20, 2023
Merged

Conversation

jayanthvn
Copy link
Contributor

What type of PR is this?
bug

Which issue does this PR fix:
With 1.11.4 we optimized to reduce the number of EC2 calls - #1975.

But this introduced a regression when PrivateIPAddressLimitExceed error is returned in a corner case. I.e, If IMDS goes out of sync and aws-node restarts then IPAMD DS will have the ENI but will be missing IPs since IMDS is out of sync. Reconciler will try allocate IPs but EC2 will return PrivateIpAddressLimitExceeded since from EC2 point of view IPs are allocated. With PrivateIpAddressLimitExceeded we used to return without an error since we will verify the actual state by calling EC2 to see what addresses have already assigned to this ENI. Pre-1.11.4, IPAMD used to make a call to EC2 to confirm the actual state - https://github.com/aws/amazon-vpc-cni-k8s/blob/v1.11.3/pkg/ipamd/ipamd.go#L946

But with 1.11.4+, We returned nil -

if err != nil {
log.Warnf("failed to allocate all available IP addresses on ENI %s, err: %v", eni.ID, err)
// Try to just get one more IP
output, err = c.awsClient.AllocIPAddresses(eni.ID, 1)
if err != nil {
ipamdErrInc("increaseIPPoolAllocIPAddressesFailed")
return false, errors.Wrap(err, fmt.Sprintf("failed to allocate one IP addresses on ENI %s, err ", eni.ID))
}
}
if output == nil {
ipamdErrInc("increaseIPPoolGetENIaddressesFailed")
return true, errors.Wrap(err, "failed to get ENI IP addresses during IP allocation")
}
We will end up returning increasedPool = true.
func (c *IPAMContext) increaseDatastorePool(ctx context.Context) {

    increasedPool, err := c.tryAssignCidrs()
    if err != nil {
....
    }
    if increasedPool {
        c.updateLastNodeIPPoolAction()
    } 

Then, updateLastNodeIPPoolAction will end up updating the c.lastNodeIPPoolAction = time.Now()

Reconciler first either increase or decrease the datastore pool -

c.updateIPPoolIfRequired(ctx)
and then reconciles the node IPs (nodeIPPoolReconcile) -
c.nodeIPPoolReconcile(ctx, nodeIPPoolReconcileInterval)
. Now the nodeIPPoolReconcile first checks the lastNodeIPPoolAction to determine if the node IP reconcilation was done in the last 60 sec -
nodeIPPoolReconcileInterval = 60 * time.Second
which is true since we updated in the previous step.

Ref:

timeSinceLast := time.Since(c.lastNodeIPPoolAction)

So the nodeIPPoolReconcile never executed and IPAMD never recovered even though IMDS recovered which is a regression post 1.11.3.

What does this PR do / Why do we need it:
In this PR, we will revert to old behavior but make EC2 call only when PrivateIpAddressLimitExceeded

If an issue # is not available please add repro steps and logs from IPAMD/CNI showing the issue:
N/A

Testing done on this change:

<PENDING - will update>

Automation added to e2e:

No, this will need IMDS to go out of sync. As a follow up will create a tracking ticket.

Will this PR introduce any new dependencies?:

N/A

Will this break upgrades or downgrades. Has updating a running cluster been tested?:
N/A

Does this change require updates to the CNI daemonset config files to work?:

N/A

Does this PR introduce any user-facing change?:


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@jayanthvn jayanthvn requested a review from a team as a code owner January 13, 2023 08:56
@jayanthvn jayanthvn requested review from achevuru and removed request for a team January 13, 2023 08:56
@jayanthvn jayanthvn added this to the v1.12.2 milestone Jan 13, 2023
@jayanthvn jayanthvn merged commit 0a9960e into aws:master Jan 20, 2023
@jayanthvn jayanthvn modified the milestones: v1.12.2, v1.13.0 Jan 28, 2023
@jayanthvn jayanthvn modified the milestones: v1.13.0, v1.12.3 Feb 6, 2023
jayanthvn added a commit to jayanthvn/amazon-vpc-cni-k8s that referenced this pull request Feb 6, 2023
* Handle private IP exceeded error

* Check for err when adding extra 1 IP
jdn5126 pushed a commit that referenced this pull request Feb 7, 2023
* Handle private IP exceeded error (#2210)

* Handle private IP exceeded error

* Check for err when adding extra 1 IP

* cherry-pick merged wrong variable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants