Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for a 'no manage' tag #726

Merged
merged 7 commits into from
Dec 11, 2019
Merged

Conversation

euank
Copy link
Contributor

@euank euank commented Nov 16, 2019

One possible solution for #725

This tag results in an ENI being unmanaged by the ipamd plugin. This
allows someone to create an ENI, associate it with an instance, and then
use eks per normal while also using that ENI for whatever.

@mogren mogren self-requested a review November 18, 2019 16:59
Copy link
Contributor

@jaypipes jaypipes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd appreciate if you could submit a separate PR with the style-only changes you've included here. I'm happy to review that style-only change/PR quickly, it's just that it makes the review of the functional change here harder to do.

ipamd/ipamd.go Outdated Show resolved Hide resolved
ipamd/ipamd.go Outdated
@@ -129,6 +133,12 @@ var (
Help: "The maximum number of ENIs that can be attached to the instance",
},
)
reservedENIs = prometheus.NewGauge(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You use the term "reserved" here, but use the term "managed" (or no-manage) elsewhere to refer to basically the same thing. Best to use the same term I think. So, maybe call the gauge "awscni_eni_unmanaged"? Though, really, I'm wondering how much value this particular gauge gives the operator of the cluster, since they can determine this information by querying the AWS EC2 APIs for ENIs with the "node.k8s.amazonaws.com/no_manage" tag, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the naming is a bit inconsistent, and also that it's of dubious value in the first place.
I've removed it, but kept the change that ipMax refers to the maximum ips available taking into account unmanaged ENIs.
I'm also not sure if ipMax should be changed when an unmanaged ENI is noticed or not.
Do you know how the ipMax metric is typically used right now? I think that might inform what we do here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't know whether/how the ip_max value is currently used, no. 🤷‍♂️ Perfectly fine with me to keep using it and adapt for the unmanaged ENIs, though.

ipamd/ipamd.go Outdated Show resolved Hide resolved
@euank
Copy link
Contributor Author

euank commented Nov 20, 2019

Thanks for the review. I've removed the two small bits of refactoring that didn't relate to this change. I skimmed through the rest of the diff and didn't see any other unrelated changes, but if there are any, let me know and I can rebase them out too.

Copy link
Contributor

@jaypipes jaypipes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting close, @euank :)

I think this just needs a little blurb added to the README about the ENI tag for disabling management of the interface by the CNI. Other than that, lgtm.

@@ -133,15 +145,3 @@ func (m *MockNetworkAPIs) UseExternalSNAT() bool {
func (mr *MockNetworkAPIsMockRecorder) UseExternalSNAT() *gomock.Call {
return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "UseExternalSNAT", reflect.TypeOf((*MockNetworkAPIs)(nil).UseExternalSNAT))
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, these mocks weren't changed in content, right? Just moved for some reason?

Copy link
Contributor Author

@euank euank Nov 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I checked for a makefile target or such that would install the right version of mockgen, but since I didn't see one I just went with the latest, and the latest version seems to generate slightly different code.

Similarly, I didn't see docs for how to change mocks, so I just ran go generate ./... and it seemed to work 🤷‍♂️

@euank
Copy link
Contributor Author

euank commented Nov 25, 2019

Thanks for the review. I've added a readme section which documents the existing eni tags and this new one.

@euank euank requested a review from jaypipes November 25, 2019 20:50
Copy link
Contributor

@jaypipes jaypipes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Little nit with the heading level in the readme. Otherwise, this looks great.

README.md Outdated Show resolved Hide resolved
@jaypipes jaypipes requested a review from nckturner November 26, 2019 14:37
@jaypipes jaypipes mentioned this pull request Nov 26, 2019
@mogren
Copy link
Contributor

mogren commented Nov 26, 2019

Hi @euank, thanks, this looks like a great feature to add. Also, thanks a lot for the documentation updates.

I have one question though, it looks like datastore.GetENINeedsIP() that is called in tryAssignIPs() could return the ENI that is marked with a no_manage tag. That would result in secondary IPs being attached to that ENI and the total count updated. I assume that's not what we want is it?

README.md Outdated Show resolved Hide resolved
@euank
Copy link
Contributor Author

euank commented Nov 26, 2019

@mogren That only returns ENIs in the datastore's pool. The two places ENIs are added to that pool is nodeInit and nodeIPPoolReconcile, and in both of those places I've added code to filter out no_manage tagged ENIs.

The idea is such ENIs never end up in the datastore's pool in the first place.

It is possible that someone could tag an ENI after already attaching it, at which point one of those two functions might have added it to the pool, but the intent is that anyone using this will tag the ENI before attaching it to the instance to ensure that doesn't happen.

Are there other paths by which an ENI can end up in the datastore? Is there some other case you're thinking of?

@mogren
Copy link
Contributor

mogren commented Nov 26, 2019

@euank Ah, you're right. It should never be in the data store at all.

One final concern though. If we don't support tagging existing ENIs, maybe we should store the numUnmanaged count in the context so that we can use it later?

For example, if we try to run more pods than can fit on the ENIs we have the function nodeIPPoolTooLow() will always count against the instance max ENI limit, and increaseIPPool() will be called on every reconcile loop even if we can't attach any more ENIs.

Another option would be to update the ENI count check on line 607. Just trying to avoid calling tryAllocateENI() when we are already at the maximum ENI count for the instance.

@jaypipes
Copy link
Contributor

One final concern though. If we don't support tagging existing ENIs, maybe we should store the numUnmanaged count in the context so that we can use it later?

++

Originally @euank had created a num_reserved metric gauge and I'd said that I didn't see much value in that as a gauge. But I can see it being useful as a field in the context.

@euank
Copy link
Contributor Author

euank commented Nov 27, 2019

Yeah, that makes sense to me. I'll give it another look through and try to figure out what makes sense.

One thing that has me wary here is that I think the user has to do manual configuration anyway, so if setting --max-pods and MAX_ENI=$numnode-$numUnmanaged fixes it without any code changes, that might be a reasonable option. If we do try and do something a bit smarter, we'll have to document how it interacts with that option clearly.

euank added 6 commits December 5, 2019 16:17
This tag results in an ENI being unmanaged by the ipamd plugin. This
allows someone to create an ENI, associate it with an instance, and then
use eks per normal while also using that ENI for whatever.
@euank
Copy link
Contributor Author

euank commented Dec 6, 2019

I updated one check to make it avoid trying and failing to allocate an ENI in the common case.

It will still think it's short IPs and complain though I believe.

After looking through the code agian, I think the best choice is to rely on the user setting MAX_ENI, and the readme already does suggest that.

If you can think of a more elegant way to handle this, I'd be happy to hear it.

Copy link
Contributor

@jaypipes jaypipes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @euank.

@jaypipes jaypipes merged commit d31e2b5 into aws:master Dec 11, 2019
mogren pushed a commit to mogren/amazon-vpc-cni-k8s that referenced this pull request Dec 15, 2019
* Add support for a 'no manage' tag

This tag results in an ENI being unmanaged by the ipamd plugin. This
allows someone to create an ENI, associate it with an instance, and then
use eks per normal while also using that ENI for whatever.

* Update tests for new changes

* Remove 'unmanaged eni' metric

* Update readme with docs about CLUSTER_NAME and no_manage

* Correctly indent tag sub-headers

* Remove trailing '.'

* Account for unmanaged ENIs in allocation check
jaypipes pushed a commit that referenced this pull request Dec 19, 2019
* Add support for a 'no manage' tag

This tag results in an ENI being unmanaged by the ipamd plugin. This
allows someone to create an ENI, associate it with an instance, and then
use eks per normal while also using that ENI for whatever.

* Update tests for new changes

* Remove 'unmanaged eni' metric

* Update readme with docs about CLUSTER_NAME and no_manage

* Correctly indent tag sub-headers

* Remove trailing '.'

* Account for unmanaged ENIs in allocation check
mogren pushed a commit to mogren/amazon-vpc-cni-k8s that referenced this pull request Apr 20, 2020
* Add support for a 'no manage' tag

This tag results in an ENI being unmanaged by the ipamd plugin. This
allows someone to create an ENI, associate it with an instance, and then
use eks per normal while also using that ENI for whatever.

* Update tests for new changes

* Remove 'unmanaged eni' metric

* Update readme with docs about CLUSTER_NAME and no_manage

* Correctly indent tag sub-headers

* Remove trailing '.'

* Account for unmanaged ENIs in allocation check
mogren pushed a commit that referenced this pull request Apr 20, 2020
* Add support for a 'no manage' tag

This tag results in an ENI being unmanaged by the ipamd plugin. This
allows someone to create an ENI, associate it with an instance, and then
use eks per normal while also using that ENI for whatever.

* Update tests for new changes

* Remove 'unmanaged eni' metric

* Update readme with docs about CLUSTER_NAME and no_manage

* Correctly indent tag sub-headers

* Remove trailing '.'

* Account for unmanaged ENIs in allocation check
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants