Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External network connectivity issue with EKS CNI #1966

Closed
yoheiueda opened this issue Jul 31, 2024 · 5 comments · Fixed by #1983
Closed

External network connectivity issue with EKS CNI #1966

yoheiueda opened this issue Jul 31, 2024 · 5 comments · Fixed by #1983
Assignees

Comments

@yoheiueda
Copy link
Member

As reported at #1920 (comment), peer pod network has external network connectivity issue with EKS CNI.

The design of the CNI plugin for Kubernetes networking over AWS VPC is described here.
https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/cni-proposal.md#solution-components

@yoheiueda
Copy link
Member Author

@bpradipt @EmmEff is it possible to collect some diagnostic data on EKS?

Create a regular (runc) pod and execute the following commands in the pod with kubectl exec. If you can access the worker node that the pod is running, please execute the same commands.

ip address show
ip link show
ip rule show
ip route show table main
ip neigh show

According to the documentation, the EKS CNI plugin explicitly sets a static ARP entry. If so, I think we can fix the issue by setting the same ARP entry in a network namespace in a peer pod VM.

@bpradipt
Copy link
Member

bpradipt commented Aug 1, 2024

@yoheiueda please find the requested details

O/p from regular runc pod

[root@priv-pod /]# ip address show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3: eth0@if17: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
    link/ether 56:37:6d:d9:1b:6a brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.0.0.149/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5437:6dff:fed9:1b6a/64 scope link
       valid_lft forever preferred_lft forever
[root@priv-pod /]#
[root@priv-pod /]# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: eth0@if17: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default
    link/ether 56:37:6d:d9:1b:6a brd ff:ff:ff:ff:ff:ff link-netnsid 0
[root@priv-pod /]#
[root@priv-pod /]# ip rule show
0:	from all lookup local
32766:	from all lookup main
32767:	from all lookup default
[root@priv-pod /]#
[root@priv-pod /]# ip route show table main
default via 169.254.1.1 dev eth0
169.254.1.1 dev eth0 scope link
[root@priv-pod /]#
[root@priv-pod /]# ip neigh show
169.254.1.1 dev eth0 lladdr fa:11:de:9d:70:4c PERMANENT

O/p from the worker node

root@i-069e28cbaee4769cf:/# ip address show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
    link/ether 0a:d1:9a:2b:a4:2f brd ff:ff:ff:ff:ff:ff
    altname enp0s5
    inet 10.0.0.155/27 metric 100 brd 10.0.0.159 scope global dynamic ens5
       valid_lft 3026sec preferred_lft 3026sec
    inet6 fe80::8d1:9aff:fe2b:a42f/64 scope link
       valid_lft forever preferred_lft forever
3: eni306cbc4b983@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
    link/ether 66:dd:ca:b1:37:ea brd ff:ff:ff:ff:ff:ff link-netns cni-abf9bb29-2407-a652-8b9f-ee6828c45956
    inet6 fe80::64dd:caff:feb1:37ea/64 scope link
       valid_lft forever preferred_lft forever
4: enifd8fb8f99f1@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
    link/ether 72:fc:2b:7b:9e:fc brd ff:ff:ff:ff:ff:ff link-netns cni-6b005a7c-9573-9156-ea6b-99423ccdcd6b
    inet6 fe80::70fc:2bff:fe7b:9efc/64 scope link
       valid_lft forever preferred_lft forever
5: eniecfe8b07af8@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
    link/ether ca:77:ea:d4:2d:7f brd ff:ff:ff:ff:ff:ff link-netns cni-824f05c3-b6c4-7ccb-445d-6895046eaf6b
    inet6 fe80::c877:eaff:fed4:2d7f/64 scope link
       valid_lft forever preferred_lft forever
6: eni910811243e2@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
    link/ether 3e:55:91:3d:ce:0f brd ff:ff:ff:ff:ff:ff link-netns cni-b7218768-8633-9d14-d634-0a43a6053a65
    inet6 fe80::3c55:91ff:fe3d:ce0f/64 scope link
       valid_lft forever preferred_lft forever
7: eni81df13ca303@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
    link/ether 22:e8:18:5c:18:94 brd ff:ff:ff:ff:ff:ff link-netns cni-6c06a3fe-4b87-b321-8c60-c129372a07a1
    inet6 fe80::20e8:18ff:fe5c:1894/64 scope link
       valid_lft forever preferred_lft forever
8: eni29b100bd66f@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
    link/ether fe:62:bd:77:68:6e brd ff:ff:ff:ff:ff:ff link-netns cni-511e8790-43ec-921a-71fa-1f6d167c3355
    inet6 fe80::fc62:bdff:fe77:686e/64 scope link
       valid_lft forever preferred_lft forever
15: eni589b674b8a8@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
    link/ether de:86:4b:fa:f6:9d brd ff:ff:ff:ff:ff:ff link-netns cni-2aaf7906-352a-f0e2-341c-d4f2ed6f4ac7
    inet6 fe80::dc86:4bff:fefa:f69d/64 scope link
       valid_lft forever preferred_lft forever
17: enif6eb1a0053f@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
    link/ether fa:11:de:9d:70:4c brd ff:ff:ff:ff:ff:ff link-netns cni-8492616a-9990-6db9-2b66-233a7a7fd26b
    inet6 fe80::f811:deff:fe9d:704c/64 scope link
       valid_lft forever preferred_lft forever
root@i-069e28cbaee4769cf:/#
root@i-069e28cbaee4769cf:/# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 0a:d1:9a:2b:a4:2f brd ff:ff:ff:ff:ff:ff
    altname enp0s5
3: eni306cbc4b983@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default
    link/ether 66:dd:ca:b1:37:ea brd ff:ff:ff:ff:ff:ff link-netns cni-abf9bb29-2407-a652-8b9f-ee6828c45956
4: enifd8fb8f99f1@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default
    link/ether 72:fc:2b:7b:9e:fc brd ff:ff:ff:ff:ff:ff link-netns cni-6b005a7c-9573-9156-ea6b-99423ccdcd6b
5: eniecfe8b07af8@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default
    link/ether ca:77:ea:d4:2d:7f brd ff:ff:ff:ff:ff:ff link-netns cni-824f05c3-b6c4-7ccb-445d-6895046eaf6b
6: eni910811243e2@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default
    link/ether 3e:55:91:3d:ce:0f brd ff:ff:ff:ff:ff:ff link-netns cni-b7218768-8633-9d14-d634-0a43a6053a65
7: eni81df13ca303@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default
    link/ether 22:e8:18:5c:18:94 brd ff:ff:ff:ff:ff:ff link-netns cni-6c06a3fe-4b87-b321-8c60-c129372a07a1
8: eni29b100bd66f@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default
    link/ether fe:62:bd:77:68:6e brd ff:ff:ff:ff:ff:ff link-netns cni-511e8790-43ec-921a-71fa-1f6d167c3355
15: eni589b674b8a8@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default
    link/ether de:86:4b:fa:f6:9d brd ff:ff:ff:ff:ff:ff link-netns cni-2aaf7906-352a-f0e2-341c-d4f2ed6f4ac7
17: enif6eb1a0053f@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default
    link/ether fa:11:de:9d:70:4c brd ff:ff:ff:ff:ff:ff link-netns cni-8492616a-9990-6db9-2b66-233a7a7fd26b
root@i-069e28cbaee4769cf:/#
root@i-069e28cbaee4769cf:/# ip rule show
0:	from all lookup local
512:	from all to 10.0.0.157 lookup main
512:	from all to 10.0.0.134 lookup main
512:	from all to 10.0.0.156 lookup main
512:	from all to 10.0.0.148 lookup main
512:	from all to 10.0.0.132 lookup main
512:	from all to 10.0.0.137 lookup main
512:	from all to 10.0.0.144 lookup main
512:	from all to 10.0.0.149 lookup main
1024:	from all fwmark 0x80/0x80 lookup main
32766:	from all lookup main
32767:	from all lookup default
root@i-069e28cbaee4769cf:/#
root@i-069e28cbaee4769cf:/# ip route show table main
default via 10.0.0.129 dev ens5 proto dhcp src 10.0.0.155 metric 100
10.0.0.2 via 10.0.0.129 dev ens5 proto dhcp src 10.0.0.155 metric 100
10.0.0.128/27 dev ens5 proto kernel scope link src 10.0.0.155 metric 100
10.0.0.129 dev ens5 proto dhcp scope link src 10.0.0.155 metric 100
10.0.0.132 dev eni81df13ca303 scope link
10.0.0.134 dev enifd8fb8f99f1 scope link
10.0.0.137 dev eni29b100bd66f scope link
10.0.0.144 dev eni589b674b8a8 scope link
10.0.0.148 dev eni910811243e2 scope link
10.0.0.149 dev enif6eb1a0053f scope link
10.0.0.156 dev eniecfe8b07af8 scope link
10.0.0.157 dev eni306cbc4b983 scope link
root@i-069e28cbaee4769cf:/#
root@i-069e28cbaee4769cf:/# ip neigh show
10.0.0.132 dev eni81df13ca303 lladdr 2a:84:f2:4c:d9:6d STALE
10.0.0.152 dev ens5 lladdr 0a:78:67:ec:2c:d5 STALE
10.0.0.156 dev eniecfe8b07af8 lladdr ea:1c:91:d7:6f:e2 REACHABLE
10.0.0.137 dev eni29b100bd66f lladdr 2a:17:07:93:e4:0b REACHABLE
10.0.0.129 dev ens5 lladdr 0a:95:a6:82:b4:ef REACHABLE
10.0.0.134 dev enifd8fb8f99f1 lladdr 4a:ef:3f:9e:3b:a5 REACHABLE
10.0.0.148 dev eni910811243e2 lladdr a6:98:cf:a9:3d:bb STALE
10.0.0.157 dev eni306cbc4b983 lladdr 72:c1:d2:01:90:1c REACHABLE
10.0.0.147 dev ens5 lladdr 0a:42:c0:5f:b6:fb REACHABLE

@yoheiueda
Copy link
Member Author

@bpradipt Thank you very much!

The output of ip address in the pod shows that the Pod IP is 10.0.0.149.

The output of ip route show table main on the worker node shows that traffics to the Pod IP is routed via enif6eb1a0053f

10.0.0.149 dev enif6eb1a0053f scope link

The output of ip link show on the worker node shows that the virtual Ethernet interface enif6eb1a0053f has MAC address fa:11:de:9d:70:4c and the other end of the virtual Ethernet is in network namespace cni-8492616a-9990-6db9-2b66-233a7a7fd26b.

17: enif6eb1a0053f@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default
    link/ether fa:11:de:9d:70:4c brd ff:ff:ff:ff:ff:ff link-netns cni-8492616a-9990-6db9-2b66-233a7a7fd26b

An ARP entry for this MAC address is explicitly set in the pod network as follows.

169.254.1.1 dev eth0 lladdr fa:11:de:9d:70:4c PERMANENT

So, I think we can fix the connectivity issue by setting this ARP entry like this

kubectl exec pod/<pod name> -- ip neigh add 169.254.1.1 dev eth0 lladdr <MAC address> nud permanent

@bpradipt could you create a peer pod and try this work around to check whether the external connectivity issue is fixed or not? You can identify a MAC address as described above.

@yoheiueda
Copy link
Member Author

Another thing I noticed is that jumbo frames (MTU 9001) are enabled on EKS.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/network_mtu.html

The current implementation of peer pods restricts a maximum MTU size to be not greater than 1450. (#68)

I am not sure this will cause connectivity issue or not. I think TCP connections are not affected since MSS is negosiated during TCP handshakes. UDP packets initiated from a peer pod will not be affected, since a smaller MTU size is used.

UDP traffics initiated from a regular pod to a peer pod will be fragmented. If path MTU Discovery does not work due to peer pods, large packets will be dropped.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/network_mtu.html#path_mtu_discovery

Anyway, jumbo frames should be supported with peer pods from the performance perspective, so I will investigate how we can adjust MTU.

@bpradipt
Copy link
Member

bpradipt commented Aug 1, 2024

Awesome @yoheiueda. I tried your suggestion and it fixes the issue :-)

@yoheiueda yoheiueda self-assigned this Aug 6, 2024
yoheiueda added a commit to yoheiueda/cloud-api-adaptor that referenced this issue Aug 7, 2024
Permanent ARP entries that are explicitly set in the network
namespace for a pod in a worker node are propagated to
the podns network namespace in a peer pod VM.

The EKS CNI plugin sets such a permanent ARP entry, and this patch
is necessary for peer pods to work with EKS networks.

Fixes confidential-containers#1966

Signed-off-by: Yohei Ueda <yohei@jp.ibm.com>
yoheiueda added a commit to yoheiueda/cloud-api-adaptor that referenced this issue Aug 16, 2024
Permanent ARP entries that are explicitly set in the network
namespace for a pod in a worker node are propagated to
the podns network namespace in a peer pod VM.

The EKS CNI plugin sets such a permanent ARP entry, and this patch
is necessary for peer pods to work with EKS networks.

Fixes confidential-containers#1966

Signed-off-by: Yohei Ueda <yohei@jp.ibm.com>
yoheiueda added a commit to yoheiueda/cloud-api-adaptor that referenced this issue Aug 16, 2024
Permanent ARP entries that are explicitly set in the network
namespace for a pod in a worker node are propagated to
the podns network namespace in a peer pod VM.

The EKS CNI plugin sets such a permanent ARP entry, and this patch
is necessary for peer pods to work with EKS networks.

Fixes confidential-containers#1966

Signed-off-by: Yohei Ueda <yohei@jp.ibm.com>
bpradipt pushed a commit that referenced this issue Aug 29, 2024
Permanent ARP entries that are explicitly set in the network
namespace for a pod in a worker node are propagated to
the podns network namespace in a peer pod VM.

The EKS CNI plugin sets such a permanent ARP entry, and this patch
is necessary for peer pods to work with EKS networks.

Fixes #1966

Signed-off-by: Yohei Ueda <yohei@jp.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants