-
Notifications
You must be signed in to change notification settings - Fork 750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pod unable to reach kube API: i/o timeout #272
Comments
I have switched AMIs to amazon linux 2 and it's still the same problem. I am going to try updating docker to 18. |
I have rolled back the amazon linux 2 hosts, they were failing to create any good pods. |
I have reported something similar. #318 |
This is happening to us as well using a CentOS 7 AMI w/ EKS 1.11 at about the same frequency as described in the ticket. Today was especially fun because this happed to our coredns pod which was added to the load balancer (no readiness installed by default :() and thus was causing a lot of DNS errors inside of our cluster which our services didn’t appreciate :/. This has become a huge headache and cause for quite a bit of frustration on our team, please help... |
I've seen this as well with a similar setup to sdavids It looked like it might only be happening on secondary ENIs - i.e. all of the pods in our cluster that have IPs allocated on the primary ENI consistently come up correctly |
Have you tried to run from master, @lnr0626 ? This solved it for us, so it seems. |
I got this during an autoscaler/pod rush from our gitlab CI spawning build pods:
|
We are no longer having this issue after upgrading to a newer CNI. |
@sdavids13 thx for the update. I'm going to close this issue out. @xrl please feel free to let us know if you see similar issues now and I will reopen. |
I am running the nginx-ingress controller which needs to talk to the kube API to get virtual host config information. I have 3 replicas, 2 of which are working and handling my production traffic. 1 of them is in a restart loop, unable to reach the kube API. It complains:
I know this issue could be rectified by
k delete pod $POD
but I am seeing this error routinely throughout my system, something like 5% of my pods enter in to this routability trap. Unable to send out messages and their software may not die immediately, leading me to believe the pod is OK but it is actually just stuck.To continue with gathering information:
the kube node
ip-10-43-169-250.ec2.internal
is running a variety of pods, some of them work just fine.and the aws-node on that kube node doesn't report any errors:
the AWS VPC CNI is configured:
I am running on kubernetes 1.10.7 and I use kops 1.10 to manage my cluster in AWS.
The network is laid out with cloudformation. It's a small vpc 10.43.168.0/21, and the kube subnets are 10.43.168.0/23, 10.43.170.0/23, 10.43.172.0/23.
About the kube nodes, all of them are m5.4xlarge or r5.2xlarge. They should have plenty of IPs to dole out, I am running 200 pods and I have 12 kube-nodes with general workloads and 12 kube-nodes with 2-5 pods.
Kernel logs from the kube-node show a lot of activity on the eth devices (related to the restarts?):
Some debugging output from the
aws-node
running on the kube-node hosting the failed nginx-ingress:The text was updated successfully, but these errors were encountered: