Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for ASG lifecycle hooks #4347

Closed
hintofbasil opened this issue Oct 19, 2021 · 5 comments
Closed

Add support for ASG lifecycle hooks #4347

hintofbasil opened this issue Oct 19, 2021 · 5 comments
Labels
kind/feature New feature or request stale

Comments

@hintofbasil
Copy link
Contributor

What feature/behavior/change do you want?

Add support for autoscaling lifecycle hooks to node groups.

A sample configuration could be the following based on the cloudformation documentation. This supports all lifecycle hook configuration options.

cluster:
  ...
  nodeGroups:
    - name: spotNodeGroup
      lifecycleHooks:
        - defaultResult: <String>
          heartbeatTimeout: <Integer>
          lifecycleHookName: <String>
          lifecycleTransition: <String>
          notificationMetadata: <String>
          notificationTargetARN: <String>
          roleARN: <String>
      ...

Why do you want this feature?

Node groups already have support for capacity rebalance however this does not automatically drain the nodes before termination. The node termination exporter, an official solution to this problem from AWS, requires the addition of lifecycle hooks to work.

These shouldn't be required for managed node groups as they will automatically drain workloads on a rebalance event.

@hintofbasil hintofbasil added the kind/feature New feature or request label Oct 19, 2021
@aclevername
Copy link
Contributor

Thanks for opening the issue @hintofbasil. Could you provide an example of the type of configuration you would do if we introduced the functionality to configure this? I'm not very familiar with the node termination exporter, but its looks very interesting and perhaps related to #4214 (comment)

@hintofbasil
Copy link
Contributor Author

Lifecycle hooks are something I've never played around with before but AWS support has recommended them for this issue. As such I can only guess at what sort of configuration we would want. But this is my best guess

cluster:
  ...
  nodeGroups:
    - name: spotNodeGroup
      lifecycleHooks:
        - defaultResult: ABANDON
          lifecycleHookName: node-drain
          lifecycleTransition: EC2_INSTANCE_TERMINATING
      ...

It is possible we would also need to set some other values such as HeartbeatTimeout, NotificationTargetARN or RoleARN but I don't see an immediate need for them with my current understanding of the node termination exporter.

I'm not sure there is a strong connection between this and the issue you linked. The NTH will perform a drain excluding daemonsets just like eksctl delete nodegroup will. Really the NTH is closer to matching eksctl delete nodegroup functionality for single node terminations.

@github-actions
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the stale label Nov 20, 2021
@github-actions
Copy link
Contributor

This issue was closed because it has been stalled for 5 days with no activity.

@t0rr3sp3dr0
Copy link
Contributor

Can we reopen this issue? I'm also looking into using AWS Node Termination Handler in Queue Processor mode with a cluster managed by eksctl.

I think eksctl could perform the first two steps described in the configuration guide (https://github.com/aws/aws-node-termination-handler#infrastructure-setup):

  1. Setup a Termination Lifecycle Hook on an ASG (https://github.com/aws/aws-node-termination-handler#1-setup-a-termination-lifecycle-hook-on-an-asg)
aws autoscaling put-lifecycle-hook \
  --lifecycle-hook-name 'my-k8s-term-hook' \
  --auto-scaling-group-name 'my-k8s-asg' \
  --lifecycle-transition 'autoscaling:EC2_INSTANCE_TERMINATING' \
  --default-result 'CONTINUE' \
  --heartbeat-timeout '300'
  1. Tag the ASGs (https://github.com/aws/aws-node-termination-handler#2-tag-the-asgs)
aws autoscaling create-or-update-tags \
  --tags 'ResourceId=my-auto-scaling-group,ResourceType=auto-scaling-group,Key=aws-node-termination-handler/managed,Value=,PropagateAtLaunch=true'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request stale
Projects
None yet
Development

No branches or pull requests

3 participants