Pod connectivity fails on certain kube-nodes #180

xrl · 2018-09-22T00:59:27Z

I am using Kops and the aws-vpc-cni version 1.1. I have 3 masters and 3 kube-nodes. 2 of those kube-nodes schedule pods but those pods cannot reach the kube-dns using the internal IP nor can they route traffic to the kube internal API (kubernetes.default.svc).

I can force pods to be scheduled on the unhealthy nodes by cordoning off the 1 healthy node.

The image I'm talking about is an ubuntu:bionic, with the proper /etc/resolv.conf in place with the expected search paths:

# cat /etc/resolv.conf
nameserver 10.43.168.10
search monitoring.svc.cluster.local svc.cluster.local cluster.local ec2.internal
options ndots:5

which lines up with the IP of DNS:

$ k -n kube-system get svc
NAME             TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)         AGE
kube-dns         ClusterIP   10.43.168.10   <none>        53/UDP,53/TCP   8d
kubelet          ClusterIP   None           <none>        10250/TCP       3d
metrics-server   ClusterIP   10.43.168.22   <none>        443/TCP         7d
$ k -n default get svc
NAME         TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.43.168.1   <none>        443/TCP   8d

then when I hop on the pod, dns is busted:

# dig archive.ubuntu.com

; <<>> DiG 9.11.3-1ubuntu1.2-Ubuntu <<>> archive.ubuntu.com
;; global options: +cmd
;; connection timed out; no servers could be reached

or

# dig kubernetes.default.svc

; <<>> DiG 9.11.3-1ubuntu1.2-Ubuntu <<>> kubernetes.default.svc
;; global options: +cmd
;; connection timed out; no servers could be reached

and connectivity by IP is also broken, here I try to curl the kubernetes API:

# curl -vk https://10.43.168.1
* Rebuilt URL to: https://10.43.168.1/
*   Trying 10.43.168.1...
* TCP_NODELAY set
[[[ SNIP: just hangs ]]]

all these things can be done from pods on the healthy node. Something is wrong with the network configuration on these unhealthy pods and I don't know how to debug it. The cni-metrics-helper looks healthy for 6 total ec2 hosts (3 of them masters):

I0922 00:55:47.881139       7 metrics.go:250] Processing metric: ipamd_action_inprogress
I0922 00:55:47.881141       7 metrics.go:92] Label: [name:"fn" value:"nodeIPPoolReconcile" ], Value: 0
I0922 00:55:47.881146       7 metrics.go:92] Label: [name:"fn" value:"nodeInit" ], Value: 0
I0922 00:55:47.881149       7 metrics.go:92] Label: [name:"fn" value:"retryAllocENIIP" ], Value: 0
I0922 00:55:47.881153       7 metrics.go:250] Processing metric: assigned_ip_addresses
I0922 00:55:47.881156       7 metrics.go:250] Processing metric: total_ip_addresses
I0922 00:55:47.881160       7 metrics.go:404] Grab/Aggregate metrics from aws-node-tj4ps
I0922 00:55:47.881163       7 cni_metrics.go:99] Grabbing metrics from CNI aws-node-tj4ps
I0922 00:55:47.886288       7 metrics.go:250] Processing metric: assigned_ip_addresses
I0922 00:55:47.886298       7 metrics.go:250] Processing metric: total_ip_addresses
I0922 00:55:47.886300       7 metrics.go:250] Processing metric: eni_allocated
I0922 00:55:47.886302       7 metrics.go:250] Processing metric: ipamd_action_inprogress
I0922 00:55:47.886305       7 metrics.go:92] Label: [name:"fn" value:"nodeIPPoolReconcile" ], Value: 0
I0922 00:55:47.886315       7 metrics.go:92] Label: [name:"fn" value:"nodeInit" ], Value: 0
I0922 00:55:47.886319       7 metrics.go:92] Label: [name:"fn" value:"retryAllocENIIP" ], Value: 0
I0922 00:55:47.886324       7 metrics.go:250] Processing metric: eni_max
I0922 00:55:47.886327       7 metrics.go:250] Processing metric: ipamd_error_count
I0922 00:55:47.886329       7 metrics.go:101] Label: [name:"error" value:"unable to get local pods, giving up"  name:"fn" value:"nodeInitK8SGetLocalPodIPsFailed" ], Value: 1
I0922 00:55:47.886339       7 metrics.go:350] Produce GAUGE metrics: assignIPAddresses, value: 17.000000
I0922 00:55:47.886343       7 metrics.go:350] Produce GAUGE metrics: totalIPAddresses, value: 228.000000
I0922 00:55:47.886346       7 metrics.go:350] Produce GAUGE metrics: eniAllocated, value: 12.000000
I0922 00:55:47.886351       7 metrics.go:350] Produce GAUGE metrics: ipamdActionInProgress, value: 0.000000
I0922 00:55:47.886353       7 metrics.go:350] Produce GAUGE metrics: eniMaxAvailable, value: 33.000000
I0922 00:55:47.886356       7 metrics.go:340] Produce COUNTER metrics: ipamdErr, value: 0.000000

and I don't see anything in particular on the kube logs of the aws-vpc-cni daemonset pod running on the unhealthy nodes, for example:

$ k -n kube-system logs aws-node-c66nn --tail=100
=====Starting installing AWS-CNI =========
=====Starting amazon-k8s-agent ===========
ERROR: logging before flag.Parse: W0917 22:32:02.326125      11 client_config.go:533] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
[[[ end of logs ]]]

The text was updated successfully, but these errors were encountered:

derekssmith · 2018-09-22T01:06:49Z

I am having this exact same issue. Did you find a resolution?

xrl · 2018-09-22T01:11:40Z

I am trying a test now where I delete those ec2 instances and let the kops-configured autoscaling group replace them with fresh hosts.

I should have left one of the unhealthy kube-nodes alone and tried it out on just one so I could figure out what was different. But ah well, next time. Now my pods work again.

It should not require a kube-node replacement to get the aws-vpc-cni to work. No reported errors from what I can tell.

xrl · 2018-09-22T01:13:19Z

@derekssmith I would not describe my resolution as a good one. I don't even have a solid error from any one component, just pod connections failing. Have you found any definitive errors in aws-vpc-cni or on a kube-node's systemctl or journalctl logs?

Edit: also, have you tried running any of the debugging scripts? I didn't run those either and that's an obvious oversight on my part. Try running those from the guide here.

derekssmith · 2018-09-22T01:29:27Z

@xrl I ended up figuring out my problem. I created two separate cloud formation stacks for different sized worker nodes. This resulted in two sets of nodes that could communicate with the nodes in the same stack, and the control plane, but they could not cross communicate. I fixed this by adding new inbound rules to their security groups. I had to allow all traffic to the other set of nodes on each.

Hope this helps.

xrl · 2018-09-22T03:43:23Z

I also see this in my kube events:

default         4m        4d        36521     ip-10-43-169-201.ec2.internal.15554f0d6a9c1ceb       Node                                          Normal    CIDRNotAvailable           cidrAllocator                            Node ip-10-43-169-201.ec2.internal status is now: CIDRNotAvailable
default         39s       2h        932       ip-10-43-171-84.ec2.internal.155693654f2aa410        Node                                          Normal    CIDRNotAvailable           cidrAllocator                            Node ip-10-43-171-84.ec2.internal status is now: CIDRNotAvailable

which seems suspect.

akashkahlon · 2018-10-08T09:04:22Z

we am facing the same issue. We are on version 1.0.0
kubernetes version: 1.10

cluster setup using Kops
3 master
4 worker nodes
amazon-vpc-cni-k8s(cni plugin)

we are seeing an issue where the kube-dns cluster IP is not reachable from some pods, as a result DNS resolution does not work from those pods
It is very random. 2 pods on same node, 1 works fine other faces this issue. If we delete the pod, the issue get resolved sometimes. We had to delete a pod twice and then it did not happen. But everytime we horizontally scale, we are seeing this issue. Any suggestion with this?

We have tried:

Scaling up kube-dns
increasing cache size in dnsmasq
using fqdn

jlory · 2018-10-17T18:42:50Z

I had the same issue today, I think it's a networking amazon-vpc-cni-k8s + ENI + VIPs issue, when I did a tcpdump on the VM that was running kube-dns I saw that:

17:25:22.943704 IP 10.29.56.21.51669 > 10.29.56.125.53: 57018+ AAAA? istio-pilot.istio-system.default.svc.cluster.local. (68)
17:25:24.284723 IP 10.29.56.217.43506 > 10.29.56.125.53: 54786+ [1au] A? google.com. (39)
17:25:25.224659 IP 10.29.56.220.36503 > 10.29.56.125.53: 26970+ A? istio-telemetry.istio-system.svc.cluster.local. (64)
17:25:25.505226 IP 10.29.56.217.34886 > 10.29.56.125.53: 11861+ AAAA? istio-pilot.istio-system.svc.cluster.local. (60)
17:25:25.760252 IP 10.29.56.21.51669 > 10.29.56.125.53: 57018+ AAAA? istio-pilot.istio-system.default.svc.cluster.local. (68)
17:25:25.895041 IP 10.29.56.144.50867 > 10.29.56.125.53: 49746+ A? zipkin.istio-system.default.svc.cluster.local. (63)

As you can see traffic comes in but then no answer, all pods on the same VM wouldn't work anymore.
Also I couldn't ping VIPs outside of the VM.
From my experience with that issue once you have a VM that behaves this way all secondaries IPs assigned to pods on the same VM won't work anymore.

Edit2: I think it's related to another issue that was fixed in a different PR, I enabled martians logs in the kernel and now I see this:

Oct 18 14:34:03 ip-10-29-56-84 kernel: IPv4: martian source 10.29.56.100 from 10.11.128.27, on dev eth1
Oct 18 14:34:03 ip-10-29-56-84 kernel: ll header: 00000000: 0e ac 9f 5c b9 2c 0e 32 2f b4 b8 7e 08 00        ...\.,.2/..~..
Oct 18 14:34:04 ip-10-29-56-84 kernel: IPv4: martian source 10.29.56.100 from 10.11.128.27, on dev eth1
Oct 18 14:34:04 ip-10-29-56-84 kernel: ll header: 00000000: 0e ac 9f 5c b9 2c 0e 32 2f b4 b8 7e 08 00        ...\.,.2/..~..
Oct 18 14:34:05 ip-10-29-56-84 kernel: IPv4: martian source 10.29.56.100 from 10.11.128.27, on dev eth1
Oct 18 14:34:05 ip-10-29-56-84 kernel: ll header: 00000000: 0e ac 9f 5c b9 2c 0e 32 2f b4 b8 7e 08 00        ...\.,.2/..~..
Oct 18 14:34:06 ip-10-29-56-84 kernel: IPv4: martian source 10.29.56.100 from 10.11.128.27, on dev eth1
Oct 18 14:34:06 ip-10-29-56-84 kernel: ll header: 00000000: 0e ac 9f 5c b9 2c 0e 32 2f b4 b8 7e 08 00        ...\.,.2/..~..
Oct 18 14:34:08 ip-10-29-56-84 kernel: IPv4: martian source 10.29.56.76 from 10.11.128.27, on dev eth1
Oct 18 14:34:08 ip-10-29-56-84 kernel: ll header: 00000000: 0e ac 9f 5c b9 2c 0e 32 2f b4 b8 7e 08 00        ...\.,.2/..~..
Oct 18 14:34:09 ip-10-29-56-84 kernel: IPv4: martian source 10.29.56.76 from 10.11.128.27, on dev eth1
Oct 18 14:34:09 ip-10-29-56-84 kernel: ll header: 00000000: 0e ac 9f 5c b9 2c 0e 32 2f b4 b8 7e 08 00        ...\.,.2/..~..
Oct 18 14:34:09 ip-10-29-56-84 kernel: IPv4: martian source 10.29.56.100 from 10.29.56.65, on dev eth1
Oct 18 14:34:09 ip-10-29-56-84 kernel: ll header: 00000000: 0e ac 9f 5c b9 2c 0e 32 2f b4 b8 7e 08 06        ...\.,.2/..~..
Oct 18 14:34:09 ip-10-29-56-84 kernel: IPv4: martian source 10.29.56.104 from 10.29.56.65, on dev eth2
Oct 18 14:34:09 ip-10-29-56-84 kernel: ll header: 00000000: 0e c5 af 95 76 dc 0e 32 2f b4 b8 7e 08 06        ....v..2/..~..
Oct 18 14:34:10 ip-10-29-56-84 kernel: IPv4: martian source 10.29.56.100 from 10.29.56.65, on dev eth1
Oct 18 14:34:10 ip-10-29-56-84 kernel: ll header: 00000000: 0e ac 9f 5c b9 2c 0e 32 2f b4 b8 7e 08 06        ...\.,.2/..~..
Oct 18 14:34:11 ip-10-29-56-84 kernel: IPv4: martian source 10.29.56.104 from 10.29.56.65, on dev eth2
Oct 18 14:34:11 ip-10-29-56-84 kernel: ll header: 00000000: 0e c5 af 95 76 dc 0e 32 2f b4 b8 7e 08 06        ....v..2/..~..
Oct 18 14:34:11 ip-10-29-56-84 kernel: IPv4: martian source 10.29.56.76 from 10.11.128.27, on dev eth1
Oct 18 14:34:11 ip-10-29-56-84 kernel: ll header: 00000000: 0e ac 9f 5c b9 2c 0e 32 2f b4 b8 7e 08 00        ...\.,.2/..~..
Oct 18 14:34:12 ip-10-29-56-84 kernel: IPv4: martian source 10.29.56.100 from 10.29.56.65, on dev eth1
Oct 18 14:34:12 ip-10-29-56-84 kernel: ll header: 00000000: 0e ac 9f 5c b9 2c 0e 32 2f b4 b8 7e 08 06        ...\.,.2/..~..
Oct 18 14:34:12 ip-10-29-56-84 kernel: IPv4: martian source 10.29.56.104 from 10.29.56.65, on dev eth2
Oct 18 14:34:12 ip-10-29-56-84 kernel: ll header: 00000000: 0e c5 af 95 76 dc 0e 32 2f b4 b8 7e 08 06        ....v..2/..~..

Related PR: #130

conversicachrisr · 2018-10-17T18:53:04Z

Edit: moving to separate ticket after more troubleshooting.

mogren · 2019-03-15T20:22:59Z

Moved to #204

xrl mentioned this issue Sep 22, 2018

Process errors with i/o timeout against the kube api endpoint kubernetes/kube-state-metrics#542

Closed

conversicachrisr mentioned this issue Oct 17, 2018

Pod routing issues preventing EKS rollout #204

Closed

mogren mentioned this issue Jan 10, 2019

Race condition between CNI plugin install and aws-k8s-agent startup #282

Closed

tabern added the needs investigation label Mar 5, 2019

mogren closed this as completed Mar 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pod connectivity fails on certain kube-nodes #180

Pod connectivity fails on certain kube-nodes #180

xrl commented Sep 22, 2018

derekssmith commented Sep 22, 2018

xrl commented Sep 22, 2018

xrl commented Sep 22, 2018 •

edited

Loading

derekssmith commented Sep 22, 2018

xrl commented Sep 22, 2018

akashkahlon commented Oct 8, 2018 •

edited

Loading

jlory commented Oct 17, 2018 •

edited

Loading

conversicachrisr commented Oct 17, 2018 •

edited

Loading

mogren commented Mar 15, 2019

Pod connectivity fails on certain kube-nodes #180

Pod connectivity fails on certain kube-nodes #180

Comments

xrl commented Sep 22, 2018

derekssmith commented Sep 22, 2018

xrl commented Sep 22, 2018

xrl commented Sep 22, 2018 • edited Loading

derekssmith commented Sep 22, 2018

xrl commented Sep 22, 2018

akashkahlon commented Oct 8, 2018 • edited Loading

jlory commented Oct 17, 2018 • edited Loading

conversicachrisr commented Oct 17, 2018 • edited Loading

mogren commented Mar 15, 2019

xrl commented Sep 22, 2018 •

edited

Loading

akashkahlon commented Oct 8, 2018 •

edited

Loading

jlory commented Oct 17, 2018 •

edited

Loading

conversicachrisr commented Oct 17, 2018 •

edited

Loading