-
Notifications
You must be signed in to change notification settings - Fork 748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aws-node amazon-k8s-cni:v1.10.2-eksbuild.1 restarts always on start #1930
Comments
Hi, can you please check if Kube-proxy is up? Ref : #1078 There is a known issue where kube-proxy takes time to start. You would see this in kube-proxy logs -
|
@jayanthvn it is up and doesn't look like #1078 I managed to reproduce this on the second node I scaled up: and then here are the describes for both aws-node and kube-proxy and logs (from terminated aws-node, running aws-node and kube-proxy)
From the crashed aws-node:
after the restart:
and from the kube-proxy that didn't restart
|
We're also having the same issue on one of our clusters. Although it doesn't come up after a restart for us, it just keeps restarting. |
@farhank3389 - What is you instance type? Do you have prefix delegation enabled? Can you please share the error code from |
@farhank3389 Could you collect logs send to k8s-awscni-triage@amazon.com.
Does it auto-resolve ? Were you able to resolve it ? |
@jayanthvn the error we are seeing we are seeing it with all instance types
|
@cgchinmay we don't use eksctl to create the cluster as we self-manage our nodes and use terraform to provision eks. |
@iomarcovalente - the error indicates CNI is not able to talk to API server. Can you please check maybe the security groups attached to the instances? Or you can email ( |
is it really |
my bad, its k8s-awscni-triage@amazon.com |
nevermind - we found the issue. It was with kube-proxy pointing to the wrong api address due to a bug in our code. Apologies for the noise |
Thanks for the update @iomarcovalente |
changed the title of this issue to always - I'm not seeing one node that doesn't restart, all my aws-nodes restart once in cluster. sending cluster arn and logs to k8s-awscni-triage@amazon.com now |
@jayanthvn doesn't this happen always for you, too? |
Looks like kube-proxy has completed programming the routes around
During this time aws-node was waiting to test reachability to api server ->
Kubelet logs ->
It has failed to reach because of kube-proxy dependency and kubelet has determined pod has dead and restarted. This time it is successful.
This is expected behavior and we are looking into removing this dependency at startup. |
this expected behavior also makes nodes that joined the cluster unable to run workloads for ~1min slowing down everything - how can I workaround this? |
workarounding faster restarts with dropping livenessProbe initialDelaySeconds from 60 to 1
|
Untill this get resolved: aws/amazon-vpc-cni-k8s#1930
Until this issue is resolved: aws/amazon-vpc-cni-k8s#1930
Hello All, I am still facing the issue related to aws-cni pod not running on my EKS cluster (via Cloudformation EKS addon functionality): EKS 1.20.11 version
Even after applying the workaorund patch ie aws-node-patch.yml
Below is the kube proxy logs:
Below is the aws cni logs:
This is not resolved. Can anyone help on this one ? |
This PR #1943 should help mitigate this issue up to some extent. Since previously aws-node used to retry API Server connectivity upon restart but with this change we retry few more times before a restart. This is milestoned for v1.11.0 release. |
@jayanthvn what about #1943 (comment) |
@matti - I see the comment is resolved. Will close this issue for now. Please try out v1.11.0 and let us know if the behavior has improved. |
|
What happened:
A new node fails to wait for IPAM-D and gets restarted. After the restart it works.
Attach logs
from the failed/terminated container run:
after restart:
What you expected to happen:
No restart
How to reproduce it (as minimally and precisely as possible):
Create new eksctl cluster, add nodes. Happens on ~25% of the starts
Anything else we need to know?:
Environment:
kubectl version
): 1.21cat /etc/os-release
):uname -a
):The text was updated successfully, but these errors were encountered: