-
Notifications
You must be signed in to change notification settings - Fork 748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Waiting for ipamd health check deadlocks node bootstrapping outside EKS #575
Comments
Sorry, this explanation is bogus -- I misread my
The cluster worked fine with 1.5.0 w/ Kubernetes 1.15.0, but I need to investigate more tomorrow. Will update here when I figure out what happened -- sorry for the noise! Feel free to close if you want, I can reopen. |
@drakedevel Ok, thanks for the follow up. We did test v1.5.2 quite a lot, both on new clusters and upgrading from older versions. I'll close this issue since since kube-proxy does start, but feel free to open another issue if you can't figure out why ipamd can't talk to the api-server. |
@mogren It looks like the actual issue is that the Everything works fine if the pod is manually deleted, but the node is broken until then as nothing will automatically get the pod out of this state. In 1.5.1, a Ideas:
|
@drakedevel You are right about the time-out.. I had another approach in this PR: #576 |
@drakedevel An image with that change is available in my ECR repo, |
@mogren works like a charm! Rolled out a fresh cluster the exact same way but with the new image, got the same timeout error, and the pod restarted as expected until
|
@drakedevel Thanks a lot for verifying! |
No problem at all, thanks for the quick fix! 😄 |
@mogren I tried using the rc and we are getting the following error on some pods aws-node logs
kubectl describe po datadog-zpfxm -n cloudplatform-system
this error occurs when a new worker is introduced. |
Thanks @seancurran157 for reporting, I'll try to reproduce it ASAP. |
@mogren any luck on reproducing? |
@seancurran157 Sorry, not yet. Got pulled in to work on some other issues. Have you tried with v1.5.3? |
This should have been solved in v1.5.3. Please reopen if this is still an issue. |
Tested 1.5.2 on Kubernetes 1.15.2
The change in #553 introduced a node bootstrapping problem on our
kubeadm
test cluster. With this change, nodes get tainted withnode.kubernetes.io/not-ready
until the ipamD is healthy. ipamD can't become healthy until it's able to reach the API server. On at leastkubeadm
andkops
clusters, the API server is a ClusterIP, which requireskube-proxy
to be up and running. This circular dependency means that nodes booting up simply sit there forever.The text was updated successfully, but these errors were encountered: