Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add and remove exclude-from-external-load-balancers label #582

Merged
merged 4 commits into from
Feb 11, 2022
Merged

Add and remove exclude-from-external-load-balancers label #582

merged 4 commits into from
Feb 11, 2022

Conversation

tjs-intel
Copy link
Contributor

Issue #, if available: #316

Description of changes:
Apply the label node.kubernetes.io/exclude-from-external-load-balancers=aws-node-termination-handler to tell load balancer controllers (like aws-load-balancer-controller) to exclude the node from load balancers before it is brought offline. If the label is already applied, don't do anything. When uncordoning the node, check that the label's value is aws-node-termination-handler before removing it.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@tjs-intel tjs-intel requested a review from a team as a code owner February 10, 2022 20:02
@tjs-intel tjs-intel changed the title add and remove exclude-from-external-load-balancers label Add and remove exclude-from-external-load-balancers label Feb 10, 2022
@tjs-intel
Copy link
Contributor Author

/assign @bwagner5

@tjs-intel
Copy link
Contributor Author

tjs-intel commented Feb 10, 2022

Tested manually on an EKS cluster:

  • Observed the label being added successfully when decreasing the desired capacity of the ASG (also observed the associated target draining).
  • Applied the label node.kubernetes.io/exclude-from-external-load-balancers=true to a node, triggered a downscale (reduced desired capacity) and observed the value of the label to be unchanged.

I have not tested the label removal when uncordoning a node. The label should not be removed if the value does not match aws-node-termination-handler.

pkg/node/node.go Outdated Show resolved Hide resolved
@bwagner5
Copy link
Contributor

Would there be any harm in always labeling with the node.kubernetes.io/exclude-from-external-load-balancers label?

We have so much configuration already, just curious if it makes sense to put it behind a flag or if we should always add the label as part of a graceful termination. I'm not strongly opposed to a flag, so if there is a good reason why someone would not want to have a node labeled with this, then this LGTM!

@tjs-intel
Copy link
Contributor Author

There is a discussion about this in an issue in aws-load-balancer-controller.

e.g. if all your nodes are spot and are going to be terminated soon, then it should be marked as unschedulable, but not exclude from LB(since it will make your LB available immediately)

I think the behavior change could be surprising for some when all instances are terminated from the ASG (ex. something that might happen when using spot instances). I guess leaving the instances attached to the LB for as long as possible in that scenario might be preferable. This is why I would leave it as a feature flag, and in principal I have no issue with it being on by default except that it might break someone's expectations in this specific scenario.

@tjs-intel tjs-intel requested a review from snay2 February 11, 2022 18:31
Copy link
Contributor

@bwagner5 bwagner5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm!

@tjs-intel
Copy link
Contributor Author

These failing tests seem to be permission errors, hopefully nothing I'm responsible for causing!

@snay2
Copy link
Contributor

snay2 commented Feb 11, 2022

@tjs-intel Looks like it was a transient failure; they passed when I re-ran them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants