[EKS] [managed node group drain pods due to AZRebalancing]: AZRebalancing is automatically applied, so cannot stop pods from draining in MNG. #1453

zeelpatel8 · 2021-07-27T13:41:24Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request
Feature request allowing to switch to only cordon on AZ Rebalance and EC2 capacity rebalance on Managed EKS node group.

Which service(s) is this request for?
EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
What outcome are you trying to achieve, ultimately, and why is it hard/impossible to do right now? What is the impact of not having this problem solved? The more details you can provide, the better we'll be able to understand and solve the problem.

The main problem is actually the inability to prevent the draining of nodes on certain notifications (AZ Rebalance and EC2 Capacity rebalance).

UseCase: "Gitlab Runner cost reduction while maximizing throughput"
Gitlab spins up bare pods for each CICD job in its Kubernetes Executor (https://docs.gitlab.com/runner/executors/kubernetes.html). Since these are bare pods, these will no survive the draining of the node on which they are scheduled resulting in a failed job in Gitlab.

Since these jobs can be restarted if necessary we are using spot instances for cost reduction. We do want to optimize for throughput instead of maximum availability so nodes should only be drained when its absolutely necessary (eg Spot termination notification). Otherwise we want to leave these pods running as long as possible.

Are you currently working around this issue?
How are you currently solving this problem?

Update the underlying ASG out of band to disable AZ rebalancing. It is not recommended as per standards.
Creating ONE nodegroup per AZ and split the original nodegroup capacity among those nodegroups. This would result in THREE nodegroups that only operate in a single ASG, which would mean that the rebalancing would never happen.

Additional context
With EKS managed node groups we can't control this behavior, like we can in the node termination handler (https://github.com/aws/aws-node-termination-handler/tree/main/config/helm/aws-node-termination-handler) called enableRebalanceDraining, resulting in many unnecessary drained nodes and failed Gitlab jobs. It would be nice to have this option in EKS managed node groups.

Attachments
If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)

The text was updated successfully, but these errors were encountered:

theintz · 2022-06-27T12:03:56Z

This is affecting us as well (and many others using long-lived deployments on managed node groups I suppose). Kinda sad to see no comments and no reactions here.
Our setup is entirely based in Terraform, so I can see 2 solutions (which are essentially the ones that @zeelpatel8 proposed):

Run awscli a local-exec Provisioner that does the call to disable the rebalancing.
Setup the ASGs with single AZs in the first place. This is difficult to do when they already exist, as they might need to be recreated.

It would be awesome to have a better way of achieving this.

mamoit · 2022-10-27T14:30:27Z

We're hitting this exact same issue with the exact same usecase as @zeelpatel8.
Gitlab executor spins up standalone pods that are completely ignored by the capacity rebalancer and by the AZ rebalancer.
Our current "solution" is to have one nodegroup per AZ and manually disable the capacity rebalance on the underlying ASG after the nodegroup creation. Since we're using terraform for the creation of our infra this becomes a really troublesome manual operation prone to errors.

mamoit · 2022-10-27T16:21:59Z

To whom may find this useful, we worked around the capacity rebalance limitation in terraform issue using a null_resource and a local-exec.
It is far from pretty, but it's better than manually changing the ASGs every time there is a change that requires recreating a nodegroup.

The STS part was taken off this reply, you may not need it depending on how you're doing your auth.

resource "null_resource" "nodegroup_asg_" {
  count = length(aws_eks_node_group.main)

  provisioner "local-exec" {
    interpreter = ["/bin/sh", "-c"]
    environment = {
      AWS_DEFAULT_REGION = data.aws_region.current.name
    }
    command = <<EOF
set -e

$(aws sts assume-role --role-arn "${data.aws_iam_session_context.current.issuer_arn}" --role-session-name terraform_asg_no_cap_rebalance --query 'Credentials.[`export#AWS_ACCESS_KEY_ID=`,AccessKeyId,`#AWS_SECRET_ACCESS_KEY=`,SecretAccessKey,`#AWS_SESSION_TOKEN=`,SessionToken]' --output text | sed $'s/\t//g' | sed 's/#/ /g')

aws autoscaling update-auto-scaling-group \
  --auto-scaling-group-name ${aws_eks_node_group.main[count.index].name} \
  --no-capacity-rebalance
EOF
  }
}

abin-tiger · 2023-02-20T09:00:00Z

We're were affected by the same issue. Had to create a support ticket to understand what was really going on.

bentlema · 2023-05-16T22:11:09Z

I'd like to see a feature to disable AZRebalance for EKS managed node groups as well. We ran into this with our jenkins-operator-managed Jenkins instance unexpectedly restarting at random times.

schniedergers · 2023-05-31T21:07:58Z

Here's what worked for me in terraform (based off the earlier answer) - it needs var.cluster_name and var.aws_region set:

data "aws_autoscaling_groups" "this" {
  filter {
    name   = "tag:k8s.io/cluster-autoscaler/enabled"
    values = ["true"]
  }
  filter {
    name   = "tag:k8s.io/cluster-autoscaler/${var.cluster_name}"
    values = ["owned"]
  }
}

resource "null_resource" "nodegroup_asg_azbalance_disable" {
  for_each = toset(data.aws_autoscaling_groups.this.names)

  provisioner "local-exec" {
    interpreter = ["/bin/sh", "-c"]
    command     = <<EOF
set -e
aws autoscaling suspend-processes \
  --region ${var.aws_region} \
  --auto-scaling-group-name ${each.key} \
  --scaling-processes AZRebalance
EOF
  }
}

greenlaw · 2023-05-31T21:37:52Z

This is affecting my team as well, as we currently use managed node groups with autoscaling to run very bursty Job workloads several times per day requiring us to scale from 0 to 100 nodes and back again.

So far I've been unsuccessful in using any of the above workarounds. While I am able to turn off the associated Autoscaling Group's AZ Rebalance feature (it shows as Off), the setting appears to have no effect. We still see undesired rebalancing behavior in the ASG Activity log like this:

At 2023-05-31T20:45:29Z instances were launched to balance instances in zones us-west-2a us-west-2b with other zones resulting in more than desired number of instances in the group. At 2023-05-31T20:45:50Z an instance was taken out of service in response to a difference between desired and actual capacity, shrinking the capacity from 8 to 7. At 2023-05-31T20:45:50Z instance i-xxxxxxxxxxxxxx was selected for termination.

We are considering several options, including 1) moving to self-managed node groups, 2) creating two separate single-AZ managed node groups, or 3) evaluating Karpenter as an alternative/supplement to the Cluster Autoscaler.

It would be a lot easier if managed node groups just supported disabling this feature.

carlosjgp · 2023-06-07T11:33:47Z

Same!

We are using Cluster Autoscaler and the annotation safe-to-evict: false to allow a long-running job to complete but AZ rebalancing is killing the node

pranchals · 2023-06-10T05:24:36Z

The AZ-Rebalancing also causes false concerns, in cases where the AZ having lesser number of nodes has insufficient capacity for the specified instance type. This result is nodegroup status to appear degraded. Although the workload running in the Ng has sufficient number of nodes to schedule, the nodegroup status appears as "degraded" due to AZRebalancing.

It would be helpful if there is an option available to disable the AZ rebalancing property for autoscalers/ng's or for it's status to only be confined to the Autoscaler events and the Ng status is left unaltered by AZ-rebalancing activities.

(As per my understanding currently there is no way to disable AZRebalancing for autoscaler from AWS console/cli/sdks

Ruben-Sh · 2023-08-18T01:23:10Z

Hi team, this item has been open since July 2021, over two years, and many EKS users are experiencing this issue - as demonstrated by the comments above. Could you please assign someone to this item and outline a plan to correct this issue please? While this item is outstanding could a workaround be provided by the EKS team please.

tylerpotts · 2023-09-01T21:07:41Z

I messaged the AWS technical account manager of our company and was told this is part of the official AWS containers roadmap and known by the internal EKS team. He does not have access to the timelines and can't say when this will be fixed.

In case anyone needs to specifically pass credentials, the below worked for me

resource "null_resource" "disable_AZRebalance_on_ASGs" {
  count = local.disable_AZRebalance == true ? length(module.eks.eks_managed_node_groups) : 0

  provisioner "local-exec" {
    interpreter = ["/bin/sh", "-c"]
    environment = {
      AWS_DEFAULT_REGION = var.region
    }
    # Note that I pipe any error messages to /dev/null and write a success/failure message to tmp
    # Otherwise errors will print out your private keys to the console
    command = <<EOF
set -e

export AWS_ACCESS_KEY_ID="${local.aws_access_key}"
export AWS_SECRET_ACCESS_KEY="${local.aws_secret_key}"
export AWS_SESSION_TOKEN="${local.aws_session_token}"

aws autoscaling suspend-processes \
  --auto-scaling-group-name ${module.eks.eks_managed_node_groups[count.index].node_group_autoscaling_group_names[0]} \
  --scaling-processes AZRebalance 2> /dev/null && echo "works" > /tmp/asg_failure${count.index} || echo "disableAZRebalance_on_ASGs failed" > /tmp/asg_failure${count.index}

EOF
  }
  # Need nodegroup names to exist before we can run above
  depends_on = [
    module.eks
  ]
  # Only runs when the nodegroup names change
  triggers = {
    value = module.eks.eks_managed_node_groups[count.index].node_group_autoscaling_group_names[0]
  }

  # Throws error if bash command fails
  lifecycle {
    postcondition {
      #Used base64 of the tmp file contents because the newlines were making it difficult to do comparisons
      condition = fileexists("/tmp/asg_failure${count.index}") ? filebase64("/tmp/asg_failure${count.index}") != "ZGlzYWJsZUFaUmViYWxhbmNlX29uX0FTR3MgZmFpbGVkCg==" : true

      error_message = "ASG bash command in null_resource.disable_AZRebalance_on_ASGs[${count.index}] failed. Output of command has been masked due to sensitive variables. Manually edit the null_resource in order to see the failure."
    }
  }
}

andrewhharmon · 2023-09-12T17:37:59Z

i see in the first post --no-capacity-rebalance is being set, but in later posts, suspend-processes is being called on the AZRebalance process. Im not sure I understand the difference, is anyone familiar enough to provide some more details on the best way to prevent the ASG from rebalancing?

bobbywatson3 · 2023-09-20T16:45:05Z

We were struggling with this for weeks. It's a shame that this is still an issue, and it's also a shame that it seems very difficult to find documentation on this unexpected EKS + cluster-autoscaler interaction.

dinukarajapaksha · 2023-10-02T08:29:00Z

We are facing the same issue with our multizone EKS clusters. Will this be fixed if we use balance-similar-node-groups=true flag in the cluster autoscaler configuration?

booleanbetrayal · 2023-10-05T14:37:02Z

Just wanted to chime in here and say that local-exec workarounds in Terraform are a painful way to work around an issue that fundamentally breaks Kubernetes clusters' ability to dictate eviction policies. I believe that until the EKS API supports the suspended processes values, AZRebalance should be disabled by default in an EKS hot-patch.

DBBrowne · 2024-01-19T18:34:14Z

In our case, AZ rebalancing was causing our k8s job nodes to be removed part way through execution. Posting for other internet denizons finding this issue in their search.

We were able to match the ASG event

        {
            "ActivityId": "7f8a081a-d009-4fbb-bed4-57ab63504429",
            "AutoScalingGroupName": "<>",
            "Description": "Terminating EC2 instance: i-04792288ff245cba2",
            "Cause": "At 2024-01-13T06:14:28Z instances were launched to balance instances in zones  eu-west-2a eu-west-2b with other zones resulting in more than desired number of instances in the group.  At 2024-01-13T06:14:57Z an instance was taken out of service in response to a difference between desired and actual capacity, shrinking the capacity from 13 to 12.  At 2024-01-13T06:14:57Z instance i-04792288ff245cba2 was selected for termination.",
            "StartTime": "2024-01-13T06:14:57.600000+00:00",
            "EndTime": "2024-01-13T06:17:03+00:00",
            "StatusCode": "Successful",
            "Progress": 100,
            "Details": "{\"Subnet ID\":\"<>",\"Availability Zone\":\"eu-west-2a\"}",
            "AutoScalingGroupARN": <>
        },

To our nodes in CA with some questionable Grafana/Prometheus usage. By selecting by either the node property, or the povider_id, were were able to match:

<some_node_metric>{...instance="ip-10-1-12-255.eu-west-2.compute.internal", ...provider_id="aws:///eu-west-2a/i-04792288ff245cba2"...}

switching off the AZ rebalancing with:

aws autoscaling suspend-processes \
   --scaling-processes AZRebalance --auto-scaling-group-name <>

appears to have resolved this for us. I'll report back if suspending AZ rebalancing turns out to be insufficient for us.

Related issue:
kubernetes/autoscaler#6107 (comment)

fcuello-fudo · 2024-04-02T11:20:04Z

i see in the first post --no-capacity-rebalance is being set, but in later posts, suspend-processes is being called on the AZRebalance process. Im not sure I understand the difference, is anyone familiar enough to provide some more details on the best way to prevent the ASG from rebalancing?

no-capacity-rebalance:

--capacity-rebalance | --no-capacity-rebalance (boolean)
          Enables or disables Capacity Rebalancing. For more information, see
          Use  Capacity Rebalancing to handle Amazon EC2 Spot Interruptions in
          the Amazon EC2 Auto Scaling User Guide

, and from https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-capacity-rebalancing.html:

"Capacity Rebalancing helps you maintain workload availability by proactively augmenting your fleet with a new Spot Instance before a running instance is interrupted by Amazon EC2. "

AZRebalance OTOH,

AZRebalance – Balances the number of EC2 instances in the group evenly across all of the specified
Availability Zones when the group becomes unbalanced

volk1234 · 2024-07-12T15:57:51Z

@tabern
Can you have a look at this issue?

This is fully about node group implementation on the EKS side. And it is not looking a big deal to add another parameter to the launch template...Please.

volk1234 · 2024-10-06T19:29:46Z

@mikestef9 ^
@maishsk ^
Probably you can help reincarnate the work on this???

zeelpatel8 added the Proposed Community submitted issue label Jul 27, 2021

mikestef9 added EKS Amazon Elastic Kubernetes Service EKS Managed Nodes EKS Managed Nodes labels Jul 27, 2021

mattburgess mentioned this issue Dec 6, 2022

aws_eks_node_group missing suspended_processes support like AZRebalance hashicorp/terraform-provider-aws#21692

Open

sdickhoven mentioned this issue Dec 30, 2022

Nodes with safe-to-evict flag set to false gets evicted during scale down kubernetes/autoscaler#4789

Closed

DBBrowne mentioned this issue Jan 19, 2024

AWS ASG scaling in on node with actively running batch job - causes job to restart on another node that doesn't have anything running on it - terminates the "wrong" node kubernetes/autoscaler#6107

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EKS] [managed node group drain pods due to AZRebalancing]: AZRebalancing is automatically applied, so cannot stop pods from draining in MNG. #1453

[EKS] [managed node group drain pods due to AZRebalancing]: AZRebalancing is automatically applied, so cannot stop pods from draining in MNG. #1453

zeelpatel8 commented Jul 27, 2021 •

edited

Loading

theintz commented Jun 27, 2022

mamoit commented Oct 27, 2022

mamoit commented Oct 27, 2022

abin-tiger commented Feb 20, 2023

bentlema commented May 16, 2023

schniedergers commented May 31, 2023 •

edited

Loading

greenlaw commented May 31, 2023

carlosjgp commented Jun 7, 2023

pranchals commented Jun 10, 2023

Ruben-Sh commented Aug 18, 2023

tylerpotts commented Sep 1, 2023 •

edited

Loading

andrewhharmon commented Sep 12, 2023

bobbywatson3 commented Sep 20, 2023 •

edited

Loading

dinukarajapaksha commented Oct 2, 2023

booleanbetrayal commented Oct 5, 2023

DBBrowne commented Jan 19, 2024 •

edited

Loading

fcuello-fudo commented Apr 2, 2024

volk1234 commented Jul 12, 2024

volk1234 commented Oct 6, 2024

[EKS] [managed node group drain pods due to AZRebalancing]: AZRebalancing is automatically applied, so cannot stop pods from draining in MNG. #1453

[EKS] [managed node group drain pods due to AZRebalancing]: AZRebalancing is automatically applied, so cannot stop pods from draining in MNG. #1453

Comments

zeelpatel8 commented Jul 27, 2021 • edited Loading

Community Note

theintz commented Jun 27, 2022

mamoit commented Oct 27, 2022

mamoit commented Oct 27, 2022

abin-tiger commented Feb 20, 2023

bentlema commented May 16, 2023

schniedergers commented May 31, 2023 • edited Loading

greenlaw commented May 31, 2023

carlosjgp commented Jun 7, 2023

pranchals commented Jun 10, 2023

Ruben-Sh commented Aug 18, 2023

tylerpotts commented Sep 1, 2023 • edited Loading

andrewhharmon commented Sep 12, 2023

bobbywatson3 commented Sep 20, 2023 • edited Loading

dinukarajapaksha commented Oct 2, 2023

booleanbetrayal commented Oct 5, 2023

DBBrowne commented Jan 19, 2024 • edited Loading

fcuello-fudo commented Apr 2, 2024

volk1234 commented Jul 12, 2024

volk1234 commented Oct 6, 2024

zeelpatel8 commented Jul 27, 2021 •

edited

Loading

schniedergers commented May 31, 2023 •

edited

Loading

tylerpotts commented Sep 1, 2023 •

edited

Loading

bobbywatson3 commented Sep 20, 2023 •

edited

Loading

DBBrowne commented Jan 19, 2024 •

edited

Loading