You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What happened:
I'm running IPv6 EKS cluster with p4de.24xlarge EC2 instance. P4de instances are used for workloads like large language models where we need large clusters and IPv6 helps with IP exhaustion. I noticed that aws-node pod is failing on p4de.24xlarge instances, but other instances from C and M instance families ran without any issues.
Attach logs
aws-node output:
Defaulted container "aws-node" out of: aws-node, aws-vpc-cni-init (init)
Installed /host/opt/cni/bin/aws-cni
time="2023-06-15T13:03:59Z" level=info msg="Starting IPAM daemon... "
Installed /host/opt/cni/bin/egress-v4-cni
time="2023-06-15T13:03:59Z" level=info msg="Checking for IPAM connectivity... "
time="2023-06-15T13:04:00Z" level=info msg="Copying config file... "
time="2023-06-15T13:04:00Z" level=info msg="Successfully copied CNI plugin binary and config file."
time="2023-06-15T13:04:00Z" level=error msg="Failed to wait for IPAM daemon to complete" error="exit status 1"
I then looked further in kubelet logs for failing p4de.24xlarge node where I found logs like
IPAMD log (/var/log/aws-routed-eni) for failing p4de.24xlarge node then gives us the real reason:
{"level":"debug","ts":"2023-06-15T18:51:53.934Z","caller":"ipamd/ipamd.go:2285","msg":"Check if instance supports Prefix Delegation"}
{"level":"debug","ts":"2023-06-15T18:51:53.934Z","caller":"awsutils/awsutils.go:1472","msg":"Instance hypervisor family unknown"}
{"level":"debug","ts":"2023-06-15T18:51:53.934Z","caller":"awsutils/awsutils.go:1472","msg":"Bare Metal Instance %!s(bool=false)"}
{"level":"error","ts":"2023-06-15T18:51:53.934Z","caller":"ipamd/ipamd.go:418","msg":"Prefix Delegation is not supported on non-nitro instance p4de.24xlarge. IPv6 is only supported in Prefix delegation Mode. "}
What you expected to happen:
P4de.24xlarge is nitro based instance and should support prefix mode for IPv6.
How to reproduce it (as minimally and precisely as possible):
Run ipv6 EKS cluster with p4de.24xlarge. One of the options is terraform and eksctl.
Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.
What happened:
I'm running IPv6 EKS cluster with p4de.24xlarge EC2 instance. P4de instances are used for workloads like large language models where we need large clusters and IPv6 helps with IP exhaustion. I noticed that
aws-node
pod is failing on p4de.24xlarge instances, but other instances from C and M instance families ran without any issues.Attach logs
aws-node output:
I then looked further in kubelet logs for failing p4de.24xlarge node where I found logs like
IPAMD log (
/var/log/aws-routed-eni
) for failing p4de.24xlarge node then gives us the real reason:What you expected to happen:
P4de.24xlarge is nitro based instance and should support prefix mode for IPv6.
How to reproduce it (as minimally and precisely as possible):
Run ipv6 EKS cluster with p4de.24xlarge. One of the options is terraform and eksctl.
Anything else we need to know?:
Looking at the code we can see we have hardcoded in CNI values that cause this error down the line.
Environment:
kubectl version
): 1.27cat /etc/os-release
): See amiamazon-eks-node-1.27-v20230607
.uname -a
): See amiamazon-eks-node-1.27-v20230607
.The text was updated successfully, but these errors were encountered: