-
Notifications
You must be signed in to change notification settings - Fork 748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pod connectivity fails on certain kube-nodes #180
Comments
I am having this exact same issue. Did you find a resolution? |
I am trying a test now where I delete those ec2 instances and let the kops-configured autoscaling group replace them with fresh hosts. I should have left one of the unhealthy kube-nodes alone and tried it out on just one so I could figure out what was different. But ah well, next time. Now my pods work again. It should not require a kube-node replacement to get the aws-vpc-cni to work. No reported errors from what I can tell. |
@derekssmith I would not describe my resolution as a good one. I don't even have a solid error from any one component, just pod connections failing. Have you found any definitive errors in aws-vpc-cni or on a kube-node's systemctl or journalctl logs? Edit: also, have you tried running any of the debugging scripts? I didn't run those either and that's an obvious oversight on my part. Try running those from the guide here. |
@xrl I ended up figuring out my problem. I created two separate cloud formation stacks for different sized worker nodes. This resulted in two sets of nodes that could communicate with the nodes in the same stack, and the control plane, but they could not cross communicate. I fixed this by adding new inbound rules to their security groups. I had to allow all traffic to the other set of nodes on each. Hope this helps. |
I also see this in my kube events:
which seems suspect. |
we am facing the same issue. We are on version 1.0.0 cluster setup using Kops we are seeing an issue where the kube-dns cluster IP is not reachable from some pods, as a result DNS resolution does not work from those pods We have tried:
|
I had the same issue today, I think it's a networking amazon-vpc-cni-k8s + ENI + VIPs issue, when I did a tcpdump on the VM that was running kube-dns I saw that:
As you can see traffic comes in but then no answer, all pods on the same VM wouldn't work anymore. Edit2: I think it's related to another issue that was fixed in a different PR, I enabled martians logs in the kernel and now I see this:
Related PR: #130 |
Edit: moving to separate ticket after more troubleshooting. |
Moved to #204 |
I am using Kops and the aws-vpc-cni version 1.1. I have 3 masters and 3 kube-nodes. 2 of those kube-nodes schedule pods but those pods cannot reach the kube-dns using the internal IP nor can they route traffic to the kube internal API (kubernetes.default.svc).
I can force pods to be scheduled on the unhealthy nodes by cordoning off the 1 healthy node.
The image I'm talking about is an
ubuntu:bionic
, with the proper/etc/resolv.conf
in place with the expected search paths:which lines up with the IP of DNS:
then when I hop on the pod, dns is busted:
or
and connectivity by IP is also broken, here I try to curl the kubernetes API:
all these things can be done from pods on the healthy node. Something is wrong with the network configuration on these unhealthy pods and I don't know how to debug it. The cni-metrics-helper looks healthy for 6 total ec2 hosts (3 of them masters):
and I don't see anything in particular on the kube logs of the aws-vpc-cni daemonset pod running on the unhealthy nodes, for example:
The text was updated successfully, but these errors were encountered: