-
Notifications
You must be signed in to change notification settings - Fork 735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Health checks fail, phantom ENI in logs #1572
Comments
Setting WARM_ENI_TARGET to 0, until we start using the new options to reserve prefixes, seems to make the problem go away for now. |
Based on the logs, you have 2 ENIs which was retrieved from IMDS and out of that one is a stale ENI. The other ENI which you are seeing is the primary ENI - Setting WARM_IP_TARGET will override WARM_ENI_TARGET. So do you have both configured? And also regarding the IPAMD issue, the 3 pods which are stuck in container creating, can you please share the reason for one of the pods on why it is stuck in container creating? |
@therc - If you can please attach the logs by running this script - sudo bash /opt/cni/bin/aws-cni-support.sh on one of the impacted nodes, we can help debug this further. |
@therc - Can you please share the instance logs? You can run this script - |
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days |
The second ENI is a stale ENI and it is expected behavior. Please feel free to open an issue for debugging the pod which is stuck in container creating. |
|
What happened:
One r5.4xlarge machine, with 15 existing IP addresses, has three containers in ContainerCreating state for almost a day now.
Looking at ipamd logs, this stands out:
I thought this might be due to stale metadata, but the problem persists even after updating to 1.9.0, which is supposed to carry some partial fixes.
An additional question: why a second ENI? Isn't the machine supposed to support 30 addresses per ENI? Then I remembered that the plugin was running with some custom settings to reduce calls to EC2 that would get us rate-limited:
WARM_ENI_TARGET=1
WARM_IP_TARGET=3
So the former might explain why a second ENI, but not why the plugin gets into this state and never recovers.
What you expected to happen:
the plugin works
How to reproduce it (as minimally and precisely as possible):
No idea how exactly, but WARM_ENI_TARGET>0 might be required. This is happening on just a few machines, out of many hundreds, and this is the most affected by far.
Anything else we need to know?:
Environment:
Kubernetes version (use
kubectl version
): Server Version: version.Info{Major:"1", Minor:"20+", GitVersion:"v1.20.7-eks-d88609", GitCommit:"d886092805d5cc3a47ed5cf0c43de38ce442dfcb", GitTreeState:"clean", BuildDate:"2021-07-31T00:29:12Z", GoVersion:"go1.15.12", Compiler:"gc", Platform:"linux/amd64"}CNI Version 1.9.0
OS (e.g:
cat /etc/os-release
):Kernel (e.g.
uname -a
): Linux ip-10-1-135-162.ec2.internal 5.4.117-58.216.amzn2.x86_64 Initial commit of amazon-vpc-cni-k8s #1 SMP Tue May 11 20:50:07 UTC 2021 x86_64 x86_64 x86_64 GNU/LinuxThe text was updated successfully, but these errors were encountered: