-
Notifications
You must be signed in to change notification settings - Fork 748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aws-node pods do not always start successfully with custom networking #1385
Comments
Hi @asheldon We will try to repro and debug this further. Can you please share me ipamd logs - Thanks. |
Do you execute this command in the pod or on the node? |
On the node @asheldon |
Logs have been sent over. |
Thanks @asheldon. Will look into it and update you. |
Hi @asheldon Can you also please share o/p of Thank you! |
|
Hi, The error - |
I see an aws-node startup failure timestamped to 00:43:14.340664 and kube-proxy records
This gap is 30.39 seconds which is less than the 32 second timeout suggested in the log. This makes me think that aws-node started the query ~1.5 seconds before kube-proxy was ready for it, hung for ~32 seconds, then crashed. |
Hi @asheldon Thanks for checking and yes it seems like aws-node came up before kube-proxy. Also can you please check if delay in kube-proxy is because of this -
|
There are no delays like that in kube-proxy startup on my node. The first timestamp is |
This is similar to #1078 . Please see the comment here - #1078 (comment) |
Kubeproxy expects to find a node matching its hostname during startup and falls back to localhost eventually. During this time, AWS-node will crash loop until kubeproxy starts successfully and becomes available. Kube-proxy daemonset should be patched as described here: https://gist.github.com/M00nF1sh/84d380b4e08017a5bc958658f7010914 if the node's hostname doesn't match the Kubernetes node name. We are also working internally to make this change available by default with kubeproxy. /cc @M00nF1sh |
What happened:
aws-node pods crash on start intermittently with custom networking enabled.
Attach logs
What you expected to happen:
No crashes / restarts
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Usually works after one or two retries
Environment:
kubectl version
):EKS 1.19
1.7.8
cat /etc/os-release
):Amazon Linux 2
uname -a
):5.4.91-41.139.amzn2
The text was updated successfully, but these errors were encountered: