Missing route in VPC peering #495

dmai-apixio · 2019-06-03T17:58:28Z

I have 2 VPCs and they are connected using VPC peering

vpc-1 10.0.0.0/16
vpc-2 10.1.0.0/16

I setup EKS cluster with 3 workers running on vpc-2. Security Groups are open for all.
Running a simple deployment to deploy nginx and expose port 80

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 12
  selector:
    matchLabels:
      app: nginx
  template:
      labels:
        app: nginx
      containers:
      - name: nginx
        image: nginx:1.15.4
        ports:
          - name: http
            containerPort: 80

And here is pod information

NAME                                READY   STATUS    RESTARTS   AGE     IP             NODE                                         NOMINATED NODE
nginx-deployment-6c479b78c5-7bztf   1/1     Running   0          2d18h   10.1.133.30    ip-10-1-134-238.us-west-2.compute.internal   <none>
nginx-deployment-6c479b78c5-88mfk   1/1     Running   0          2d18h   10.1.128.223   ip-10-1-129-172.us-west-2.compute.internal   <none>
nginx-deployment-6c479b78c5-9qp9z   1/1     Running   0          2d18h   10.1.131.223   ip-10-1-129-172.us-west-2.compute.internal   <none>
nginx-deployment-6c479b78c5-d6nhq   1/1     Running   0          2d18h   10.1.130.219   ip-10-1-129-172.us-west-2.compute.internal   <none>
nginx-deployment-6c479b78c5-g69tr   1/1     Running   0          2d18h   10.1.139.180   ip-10-1-137-77.us-west-2.compute.internal    <none>
nginx-deployment-6c479b78c5-ghnq2   1/1     Running   0          2d18h   10.1.135.210   ip-10-1-134-238.us-west-2.compute.internal   <none>
nginx-deployment-6c479b78c5-hd5cr   1/1     Running   0          2d18h   10.1.131.102   ip-10-1-129-172.us-west-2.compute.internal   <none>
nginx-deployment-6c479b78c5-jfxr8   1/1     Running   0          2d18h   10.1.132.160   ip-10-1-134-238.us-west-2.compute.internal   <none>
nginx-deployment-6c479b78c5-m65hn   1/1     Running   0          2d18h   10.1.139.120   ip-10-1-137-77.us-west-2.compute.internal    <none>
nginx-deployment-6c479b78c5-qv68l   1/1     Running   0          2d18h   10.1.136.100   ip-10-1-137-77.us-west-2.compute.internal    <none>
nginx-deployment-6c479b78c5-t2xv6   1/1     Running   0          2d18h   10.1.138.6     ip-10-1-137-77.us-west-2.compute.internal    <none>
nginx-deployment-6c479b78c5-zrfjh   1/1     Running   0          2d18h   10.1.133.82    ip-10-1-134-238.us-west-2.compute.internal   <none>

From any server in the vpc-2 (the same vpc running EKS), i can connect to port 80 of these pods. But in any server on vpc-1, i can only hit 9 of 12 nginx pods.

kubectl get po -o wide | grep nginx | awk -F ' ' '{print $6}' | xargs -I{} bash -c 'echo -n {}; curl -s -o /dev/null -w " %{http_code}\n" -m 1 {}'
10.1.133.30 200
10.1.128.223 200
10.1.131.223 000
10.1.130.219 200
10.1.139.180 200
10.1.135.210 000
10.1.131.102 200
10.1.132.160 200
10.1.139.120 200
10.1.136.100 200
10.1.138.6 000
10.1.133.82 200

As you can see the result above. i can not hit 10.1.131.223, 10.1.135.210, 10.1.138.6. I ran command /opt/cni/bin/aws-cni-support.sh on each worker and found three of them have one thing in common. These IPs belonged to the secondary interface (not a main one - eth0).

# pod.output on ip-10-1-134-238.us-west-2.compute.internal
  "nginx-deployment-6c479b78c5-ghnq2_testing_96f65f1f5e79a84afeacdf2f45c329cb38c847138fc3b2db4566ec48e18c0c42": {
    "IP": "10.1.135.210",
    "DeviceNumber": 2
  },

# pod.output on ip-10-1-137-77.us-west-2.compute.internal
  "nginx-deployment-6c479b78c5-t2xv6_testing_eeb39b4d85f9f7fdece2613e27d661127c394030c0b760959768cb4b749e7a0e": {
    "IP": "10.1.138.6",
    "DeviceNumber": 2
  }

pod.output on ip-10-1-129-172.us-west-2.compute.internal
  "nginx-deployment-6c479b78c5-9qp9z_testing_eb35b8a44be06854517d834cdd459cea98a66f17c28317e185fa82bccb09ccd2": {
    "IP": "10.1.131.223",
    "DeviceNumber": 2
  },

I pick up pod with IP 10.1.135.210 running on worker ip-10-1-134-238.us-west-2.compute.internal to continue checking route table

[ec2-user@ip-10-1-134-238 ~]$ ip rule show
0:	from all lookup local
512:	from all to 10.1.132.11 lookup main
512:	from all to 10.1.134.230 lookup main
512:	from all to 10.1.133.129 lookup main
512:	from all to 10.1.135.193 lookup main
512:	from all to 10.1.135.47 lookup main
512:	from all to 10.1.133.95 lookup main
512:	from all to 10.1.133.82 lookup main
512:	from all to 10.1.135.210 lookup main
512:	from all to 10.1.132.160 lookup main
512:	from all to 10.1.133.30 lookup main
1024:	from all fwmark 0x80/0x80 lookup main
1536:	from 10.1.135.210 to 10.1.0.0/16 lookup 2
32766:	from all lookup main
32767:	from all lookup default
[ec2-user@ip-10-1-134-238 ~]$ ip route show table main
default via 10.1.132.1 dev eth0
10.1.132.0/22 dev eth0 proto kernel scope link src 10.1.134.238
10.1.132.11 dev eni3367e1e163b scope link
10.1.132.160 dev eni79af95bd2b2 scope link
10.1.133.30 dev eni7976e813d36 scope link
10.1.133.82 dev enia1d23be782e scope link
10.1.133.95 dev enid5a84d7679a scope link
10.1.133.129 dev eni4ab52ffff90 scope link
10.1.134.230 dev eni78b12ef97b2 scope link
10.1.135.47 dev eni10d2e7ffdce scope link
10.1.135.193 dev eni242a6cbb667 scope link
10.1.135.210 dev eni1dc02e51912 scope link
169.254.169.254 dev eth0

I found there is no route to vpc-1 (10.0.0.0/16) on woker node. Adding new rule and it worked perfectly.

[root@ip-10-1-134-238 ec2-user]# ip rule add from 10.1.135.210 to 10.0.0.0/16 priority 1537 table 2

My question is should vpc-cni-k8s plugin add routing automatically when see vpc peering? If i have multiple VPCs would like to talk to EKS, i need to add routing manually on every worker node. That's not good.

The text was updated successfully, but these errors were encountered:

sethp-nr · 2019-06-03T21:54:20Z

My limited understanding is that VPC peers should be handled by the default route from the machine's perspective: do the two VPCs that you've paired have entries for the peering connection in their route tables?

dmai-apixio · 2019-06-04T01:21:50Z

@sethp-nr i think i have a correct route table in peering connection. I can reach 9 over 12 pods in vpc-2 from vpc-1.

ewbankkit · 2019-07-16T16:53:43Z

@dmai-apixio What is your AWS_VPC_K8S_CNI_EXTERNALSNAT value?

mogren · 2019-09-28T00:08:10Z

Either enable AWS_VPC_K8S_CNI_EXTERNALSNAT, or use AWS_VPC_K8S_CNI_EXCLUDE_SNAT_CIDRS added in #520 and available in v1.6.0-rc2 or later.

dmai-apixio changed the title ~~Missing route in multi VPCs environment~~ Missing route in VPC peering Jun 3, 2019

mogren added the question label Jun 4, 2019

mogren closed this as completed Sep 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing route in VPC peering #495

Missing route in VPC peering #495

dmai-apixio commented Jun 3, 2019

sethp-nr commented Jun 3, 2019

dmai-apixio commented Jun 4, 2019

ewbankkit commented Jul 16, 2019

mogren commented Sep 28, 2019

Missing route in VPC peering #495

Missing route in VPC peering #495

Comments

dmai-apixio commented Jun 3, 2019

sethp-nr commented Jun 3, 2019

dmai-apixio commented Jun 4, 2019

ewbankkit commented Jul 16, 2019

mogren commented Sep 28, 2019