EKS managed default node group and max-pods #2297

schealex · 2022-11-15T12:17:38Z

Description

We are facing an issue where all Nodes in our EKS have a max-pods limit of 20 regardless of node type. We have tried a couple steps but failed to resolve the problem.

We found that the user-data in the launchTemplate is the issue:

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="//"

--//
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
set -ex
B64_CLUSTER_CA=XXX
API_SERVER_URL=XXX
K8S_CLUSTER_DNS_IP=XXX
/etc/eks/bootstrap.sh XXX --kubelet-extra-args '--node-labels=eks.amazonaws.com/nodegroup-image=ami-08c86de312838cfb6,eks.amazonaws.com/capacityType=SPOT,eks.amazonaws.com/nodegroup=XXXX --max-pods=20' --b64-cluster-ca $B64_CLUSTER_CA --apiserver-endpoint $API_SERVER_URL --dns-cluster-ip $K8S_CLUSTER_DNS_IP --use-max-pods false

--//--

What we want is to use the "--use-max-pods" flag turned on and the "--max-pods=20" removed.

We are using the a managed eks cluster with the default node group.

We've seen a lot of issues referencing the disabling of that feature but none for having troubles with it been disabled by default. There are snippets like this:

pre_bootstrap_user_data    = <<-EOT
    #!/bin/bash
    set -ex
    cat <<-EOF > /etc/profile.d/bootstrap.sh
    export CONTAINER_RUNTIME="containerd"
    export USE_MAX_PODS=false
    export KUBELET_EXTRA_ARGS="--max-pods=35"
    EOF
    # Source extra environment variables in bootstrap script
    sed -i '/^set -o errexit/a\\nsource /etc/profile.d/bootstrap.sh' /etc/eks/bootstrap.sh
    EOT

but even "export USE_MAX_PODS=true" would not help if the userdata (see above) is providing that as argument to the bootstrap.sh.

Any help in solving this is greatly appreciated!

✋ I have searched the open/closed issues and my issue is not listed.

Versions

Module version [Required]: 18.2.6
Terraform version:
v0.14.11
Provider version(s):
Installing hashicorp/archive v2.2.0...
Installing hashicorp/external v2.2.3...
Installing hashicorp/null v3.2.0...
Installing gavinbunney/kubectl v1.13.1...
Installing hashicorp/kubernetes v2.15.0...
Installing hashicorp/local v2.2.3...
Installing hashicorp/template v2.2.0...
Installing hashicorp/tls v4.0.4...
Installing hashicorp/helm v2.5.1...
Installing hashicorp/random v3.4.3...
Installing hashicorp/cloudinit v2.2.0...
Installing hashicorp/time v0.9.1...
Installing hashicorp/aws v3.72.0...
Installing terraform-aws-modules/http v2.4.1...
Installing hashicorp/http v3.2.1...

Reproduction Code [Required]

  eks_managed_node_group_defaults = {
    ami_type       = "AL2_x86_64"
    disk_size      = 50
  }

  eks_managed_node_groups = {
    # Default node group - as provided by AWS EKS
    default_node_group = {
      # By default, the module creates a launch template to ensure tags are propagated to instances, etc.,
      # so we need to disable it to use the default template provided by the AWS EKS managed node group service
      create_launch_template = false
      launch_template_name   = ""

      # Remote access cannot be specified with a launch template
      remote_access = {
        ec2_ssh_key = module.aws_key_pair.key_name
      }

      update_config = {
        max_unavailable_percentage = 25
      }

      min_size     = 4
      max_size     = 10
      desired_size = 4

      instance_types          = ["t3.large", "t2.large", "t3a.large", "m5.large", "m4.large"]
      instance_market_options = {
        market_type = "spot"
      }
      capacity_type = "SPOT"

      // Add policy required for SSM
      iam_role_additional_policies = ["arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore", "arn:aws:iam::aws:policy/service-role/AmazonEC2RoleforSSM"]

      tags = merge(local.tags, {
        Name = "eks-node-${var.environment}"
      })
    }
  }

Steps to reproduce the behavior:

Create a new cluster with the default eks managed node group.

Expected behavior

All nodes have their max-pods set specific for their type via the bootstrap.sh provided by amazon

Actual behavior

All nodes have a max-pods limit of 20

The text was updated successfully, but these errors were encountered:

adegoodyer · 2022-11-17T17:15:18Z

Default max pods per node is usually 110 on a Kubernetes cluster however with the Amazon CNI its limited (number of allowable ENI’s x max num of allowable private IP addresses - 1).. which is currently 17 for the instance tier chosen for our worker node group. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html

This has been improved in that latest versions of the CNI add-on by use of a prefix assignment mode https://aws.amazon.com/blogs/containers/amazon-vpc-cni-increases-pods-per-node-limits/ however unusually, when deploying EKS via Terraform, ever when selecting the official AMI it is deemed a custom image and so is not configured aws/containers-roadmap#138 (comment)

I’ve set it manually via https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html and cycled nodes in the cluster.. but the limit still applies - step 3 here suggests that using managed node groups in EKS calculates the max number of pods for you and as you’ve guessed, this cannot be set manually via Terraform.

A workaround seems to exist and worked for me when tested https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/docs/user_data.md#%EF%B8%8F-caveat

pre_bootstrap_user_data = <<-EOT
  #!/bin/bash
  set -ex
  cat <<-EOF > /etc/profile.d/bootstrap.sh
  export USE_MAX_PODS=false
  export KUBELET_EXTRA_ARGS="--max-pods=110"
  EOF
  # Source extra environment variables in bootstrap script
  sed -i '/^set -o errexit/a\\nsource /etc/profile.d/bootstrap.sh' /etc/eks/bootstrap.sh
  EOT

Definitely apply this via terraform as well, so it rolls out the changes incrementally to your nodes for you as cycling them manually is a bit of a pain. I tested in a separate test node group first, so you will certainly want to do the same before applying to production.

Not ideal, but hope this helps!

bryantbiggs · 2022-12-17T14:42:04Z

Here is an example that has this configured for prefix delegation, hopefully this helps https://github.com/clowdhaus/eks-reference-architecture/blob/main/ipv4-prefix-delegation/eks.tf

laurisvan · 2023-01-05T11:37:55Z

One solution for prefix delegation is to simply let max-pods-calculator do its job with the information that prefix delegation is on. I tried multiple different ways for enforcing max-pods, and it felt that sometimes it worked, and sometimes it didn't (I don't know why).

However, something as simple as this worked alright:

  eks_managed_node_group_defaults = {
    ebs_optimized        = true
    capacity_type        = var.managed_node_capacity_types[local.env]
    force_update_version = true
    instance_types       = ["t3a.medium", "t3.medium"]
    # We are using the IRSA created below for permissions
    # However, we have to deploy with the policy attached FIRST (when creating a fresh cluster)
    # and then turn this off after the cluster/node group is created. Without this initial policy,
    # the VPC CNI fails to assign IPs and nodes cannot join the cluster
    # See https://github.com/aws/containers-roadmap/issues/1666 for more context
    iam_role_attach_cni_policy = true

    # Inform nodes that prefix delegation is working. This enables max-pods-calculator to work correctly.
    # See: https://github.com/awslabs/amazon-eks-ami/blob/master/files/max-pods-calculator.sh
    # See: https://github.com/terraform-aws-modules/terraform-aws-eks/issues/2059#issuecomment-1120199919
    pre_bootstrap_user_data = <<-EOT
      #!/bin/bash
      set -ex
      cat <<-EOF > /etc/profile.d/bootstrap.sh
        # This is the magic that permits max-pods computation to succeed
        export CNI_PREFIX_DELEGATION_ENABLED=true
      EOF
      # Source extra environment variables in bootstrap script
      sed -i '/^set -o errexit/a\\nsource /etc/profile.d/bootstrap.sh' /etc/eks/bootstrap.sh
    EOT

    block_device_mappings = {
      xvda = {
        device_name = "/dev/xvda"
        ebs = {
          delete_on_termination = true
          encrypted             = true
          volume_size           = 20
          volume_type           = "gp3"
        }
      }
    }
  }

I am not 100% sure that it works on a clean stack, as it felt that when I was applying Terraform changes, sometimes the node groups ended up utilising old templates, sometimes new templates. However, I am in the belief that this is what my current node groups are running now.

github-actions · 2023-02-05T02:16:12Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

bryantbiggs added the question label Nov 15, 2022

bryantbiggs closed this as completed Dec 17, 2022

github-actions bot locked as resolved and limited conversation to collaborators Feb 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EKS managed default node group and max-pods #2297

EKS managed default node group and max-pods #2297

schealex commented Nov 15, 2022

adegoodyer commented Nov 17, 2022 •

edited

Loading

bryantbiggs commented Dec 17, 2022

laurisvan commented Jan 5, 2023 •

edited

Loading

github-actions bot commented Feb 5, 2023

EKS managed default node group and max-pods #2297

EKS managed default node group and max-pods #2297

Comments

schealex commented Nov 15, 2022

Description

Versions

Reproduction Code [Required]

Expected behavior

Actual behavior

adegoodyer commented Nov 17, 2022 • edited Loading

bryantbiggs commented Dec 17, 2022

laurisvan commented Jan 5, 2023 • edited Loading

github-actions bot commented Feb 5, 2023

adegoodyer commented Nov 17, 2022 •

edited

Loading

laurisvan commented Jan 5, 2023 •

edited

Loading