[driver] refactor pod network configuration not to use static ARP entry #2118

veshij · 2022-10-25T03:33:07Z

What type of PR is this?
bugfix

Which issue does this PR fix:
Fixes: #2103

What does this PR do / Why do we need it:
This PR changes pod network confuguration so static ARP entry for host's veth mac address is no longer need.
Previous configuration:

host veth doesn't have an IP address asigned
pod veth has a direct route to fake router IP address
pod veth has default route via fake router IP address
fake router IP address has static arp entry in pod namespace pointed to host system's veth interface

$ ip addr sho dev enidbd62a3316e
18037: enidbd62a3316e@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether 46:a9:d3:46:96:82 brd ff:ff:ff:ff:ff:ff link-netns  cni-9fc9c11a-2553-b00c-4ba8-98228b5a846a
    inet6 fe80::98df:3bff:feb0:dcf2/64 scope link
       valid_lft forever preferred_lft forever

$ sudo ip netns exec cni-9fc9c11a-2553-b00c-4ba8-98228b5a846a ip ro
default via 169.254.1.1 dev eth0
169.254.1.1 dev eth0 scope link
$ sudo ip netns exec cni-9fc9c11a-2553-b00c-4ba8-98228b5a846a arp -n
Address                  HWtype  HWaddress           Flags Mask            Iface
169.254.1.1              ether   46:a9:d3:46:96:82   CM                    eth0

Updated configuration:

host veth has link-local address assigned (all interfaces have same address)
pod has an onlink default route to host's veth interface

$ sudo ip netns exec cni-0e566121-8e96-b032-a3e3-15ca97a87f01 ip ro
default via 169.254.1.1 dev eth0 onlink

$ ip addr sho dev enibd675d21895
18038: enibd675d21895@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether a2:f3:e2:0c:5b:8b brd ff:ff:ff:ff:ff:ff link-netns cni-978889f6-92c0-8067-14a7-f509f39fa7f9
    inet 169.254.1.1/32 scope global enibd675d21895
       valid_lft forever preferred_lft forever
    inet6 fe80::1057:1eff:fe7d:fe0b/64 scope link
       valid_lft forever preferred_lft forever

More details in #2103
TLDR: on Ubuntu 22.04+ udevd assigns permanent mac address to host system veth interface once it's moved to host network namespace. Since it happens after the pod network is configured with non-permanent mac address - pod has no network connectivity.
Why not fix existing pod network configuration?
I was not able to find a good way to monitor the udev-induced mac address change. The only option would be to add a sleep with some arbitrary timeout which either will slow-down pod allocation or will not be enough on heavy-loaded systems.

If an issue # is not available please add repro steps and logs from IPAMD/CNI showing the issue:

Testing done on this change:
Verified updated configuration works on k8s cluster running on AWS EC2.

Automation added to e2e:

Will this PR introduce any new dependencies?:
No

Will this break upgrades or downgrades. Has updating a running cluster been tested?:
Should not/No

Does this change require updates to the CNI daemonset config files to work?:
No

Does this PR introduce any user-facing change?:
It changes pod network configuration as well as adds link-local address to host system's veth interface.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

olemarkus · 2022-11-05T12:46:06Z

I tried running this patch (+ nftables) in our e2e setup and it seems to be working:
https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/kops/14491/pull-kops-e2e-cni-amazonvpc/1588816070515363840

veshij · 2022-11-14T22:35:50Z

@olemarkus thanks for confirming!
I'm trying to find someone who could review this PR.

jdn5126 · 2022-11-15T22:36:31Z

@veshij we are working on testing this internally. Will provide a timeframe later this week after we get this assigned

veshij · 2022-11-15T22:56:21Z

@veshij we are working on testing this internally. Will provide a timeframe later this week after we get this assigned

thanks for update!

jaydeokar · 2022-11-18T18:16:26Z

@veshij we are working on testing this internally. Will provide a timeframe later this week after we get this assigned

thanks for update!

@veshij I'll be working on testing this internally. We plan to finish the internal testing by end of next week

jaydeokar · 2022-11-29T18:40:08Z

@veshij I'll be working on testing this internally. We plan to finish the internal testing by end of next week

I'm investigating few issues around Security group per pod feature which seems to be affected by this change

veshij · 2022-11-29T19:50:22Z

@jaydeokar I can help with troubleshooting if you could provide some details on the issue.

jaydeokar · 2022-11-29T23:56:43Z

@veshij
There are two issues as of now that we are tracking
Ref Security Group for Pod, Security group Enforcing Mode

strict Mode

Pod to Pod within the same security group fails even with allow all access on the security group. In the above example, curl from one pod to another fails when it is expected to allow the traffic, since they are using same security group and no block rules are added.

standard Mode

Pod to Pod communication works in this mode. But communication between the pod to other host VM IPs in the cluster fails

jdn5126 · 2022-11-30T22:14:47Z

@veshij There are two issues as of now that we are tracking Ref Security Group for Pod, Security group Enforcing Mode

strict Mode

Pod to Pod within the same security group fails even with allow all access on the security group. In the above example, curl from one pod to another fails when it is expected to allow the traffic, since they are using same security group and no block rules are added.

standard Mode

Pod to Pod communication works in this mode. But communication between the pod to other host VM IPs in the cluster fails

The TL;DR here is that in SGPP strict mode, the container is unable to resolve the ARP binding for its gateway addr (link-local address). In SGPP standard mode and non-SGPP, the container gets an ARP reply mapping the link-local IP to the host-side veth MAC.

In SGPP strict mode, we configure IIF-based IP route rules: https://github.com/aws/amazon-vpc-cni-k8s/blob/master/cmd/routed-eni-cni-plugin/driver/driver.go#L557 , and we are trying to determine why the kernel does generate ARP reply in this case. Still digging...

github-actions · 2023-02-08T00:03:15Z

This pull request is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

github-actions · 2023-02-23T00:02:59Z

Pull request closed due to inactivity.

veshij · 2023-02-23T18:05:08Z

uh-oh, I somehow forgot about it.
Trying to repro and the issue to fix it.

benedikt-bartscher · 2023-03-13T22:43:36Z

Any news on this one?

Deshke · 2023-04-17T07:16:21Z

any update on this ? @jdn5126 @veshij

jdn5126 · 2023-04-17T15:01:25Z

any update on this ? @jdn5126 @veshij

I am going to have to defer to @veshij here. As it stands, we cannot merge this until the SGPP strict mode issue is understood

veshij · 2023-04-22T04:46:45Z

ugh, totally forgot about this one.
I'll be on PTO next week, will take a look after that.

heybronson · 2023-05-26T17:40:42Z

@veshij any updates here? Can this be merged?

github-actions · 2023-07-26T00:02:48Z

This pull request is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

github-actions · 2023-08-09T00:03:05Z

Pull request closed due to inactivity.

jayanthvn · 2023-08-09T00:19:23Z

Not stale..need to debug the SGPP failures.

github-actions · 2023-10-09T00:03:14Z

This pull request is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

joshfrench · 2023-10-10T17:12:11Z

Just commenting to keep this alive, we are very interested in seeing this fixed. Thank you!

OverStruck · 2023-11-17T19:05:59Z

bump

jdn5126 · 2023-11-22T20:10:52Z

I picked this up again, and I still cannot determine why in SGPP strict mode, the pod is unable to ARP for its host veth:

$ sudo tcpdump -vvven -i vlan873aabf4bea
tcpdump: listening on vlan873aabf4bea, link-type EN10MB (Ethernet), capture size 262144 bytes
20:07:16.391175 06:59:cf:7d:01:cb > Broadcast, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 169.254.1.1 tell 192.168.65.161, length 28

In SGPP strict, we have a rule for pod traffic to force everything through the branch ENI:

$ ip rule show
10:     from all iif vlan873aabf4bea lookup 110
20:     from all lookup local

And that routing table is:

$ ip route show table 101
default via 192.168.64.1 dev vlan.eth.1
192.168.64.1 dev vlan.eth.1 scope link
192.168.86.180 dev vlan605fbc0e45b scope link

The host veth still has 169.254.1.1 assigned:

$ ifconfig vlan873aabf4bea
vlan873aabf4bea: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9001
        inet 169.254.1.1  netmask 255.255.255.255  broadcast 0.0.0.0
        inet6 fe80::d839:a9ff:fee7:9ade  prefixlen 64  scopeid 0x20<link>
        ether da:39:a9:e7:9a:de  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

And I don't think any sysctls are preventing this from succeeding. Still digging

github-actions · 2024-01-22T00:03:28Z

This pull request is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

jayanthvn · 2024-01-22T03:31:22Z

/not stale

jdn5126 · 2024-01-30T16:40:40Z

Closing this pull request as we cannot implement this approach without breaking the Security Groups for Pods architecture. The recommendation for host operating systems is to set MACAddressPolicy=none in /usr/lib/systemd/network/99-default.link, as the official EKS AMI does for AL2023: https://github.com/awslabs/amazon-eks-ami/blob/master/scripts/install-worker.sh#L104

veshij mentioned this pull request Oct 25, 2022

Pods cannot talk to cluster IPs on Ubuntu 2204 #2103

Closed

[driver] refactor pod netork configuration not to use static ARP entry

504a45d

veshij force-pushed the ns_fix branch from a73fde8 to 504a45d Compare October 25, 2022 08:51

veshij added 3 commits October 25, 2022 02:24

deduplicate address selection

e22a628

simplify

310a6db

cleanup comments

3a6e708

veshij marked this pull request as ready for review October 25, 2022 19:05

veshij requested a review from a team as a code owner October 25, 2022 19:05

olemarkus mentioned this pull request Nov 3, 2022

WIP: Fix vpc cni for ubuntu 2204 kubernetes/kops#14491

Closed

jayanthvn requested a review from achevuru November 14, 2022 22:43

hakman mentioned this pull request Nov 15, 2022

kOps: Use Ubuntu 20.04 for AWS VPC CNI tests kubernetes/test-infra#28006

Merged

jdn5126 assigned jaydeokar Nov 18, 2022

jayanthvn added this to the v1.12.1 milestone Nov 23, 2022

jdn5126 removed this from the v1.12.1 milestone Dec 9, 2022

github-actions bot added the stale Issue or PR is stale label Feb 8, 2023

github-actions bot closed this Feb 23, 2023

jdn5126 reopened this Feb 23, 2023

github-actions bot removed the stale Issue or PR is stale label Feb 24, 2023

github-actions bot added the stale Issue or PR is stale label Jul 26, 2023

github-actions bot closed this Aug 9, 2023

jayanthvn reopened this Aug 9, 2023

github-actions bot removed the stale Issue or PR is stale label Aug 10, 2023

dims mentioned this pull request Aug 15, 2023

Switch back to aws vpc cni - Revert systemd-udev change (MACAddressPolicy) that disables networking from pods kubernetes-sigs/provider-aws-test-infra#119

Merged

github-actions bot added the stale Issue or PR is stale label Oct 9, 2023

jdn5126 removed the stale Issue or PR is stale label Oct 10, 2023

jdn5126 unassigned jaydeokar Oct 10, 2023

jdn5126 removed the request for review from achevuru October 10, 2023 17:14

dims mentioned this pull request Nov 24, 2023

Miscellaneous fixes from AL2023 testing awslabs/amazon-eks-ami#1528

Merged

Deshke mentioned this pull request Jan 16, 2024

AWS VPC CNI Ubuntu 22.04 MACAddressPolicy kubernetes/kops#16255

Closed

github-actions bot added the stale Issue or PR is stale label Jan 22, 2024

github-actions bot removed the stale Issue or PR is stale label Jan 23, 2024

jdn5126 closed this Jan 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[driver] refactor pod network configuration not to use static ARP entry #2118

[driver] refactor pod network configuration not to use static ARP entry #2118

veshij commented Oct 25, 2022 •

edited

Loading

olemarkus commented Nov 5, 2022

veshij commented Nov 14, 2022

jdn5126 commented Nov 15, 2022

veshij commented Nov 15, 2022

jaydeokar commented Nov 18, 2022

jaydeokar commented Nov 29, 2022

veshij commented Nov 29, 2022 •

edited

Loading

jaydeokar commented Nov 29, 2022

jdn5126 commented Nov 30, 2022

github-actions bot commented Feb 8, 2023

github-actions bot commented Feb 23, 2023

veshij commented Feb 23, 2023

benedikt-bartscher commented Mar 13, 2023

Deshke commented Apr 17, 2023

jdn5126 commented Apr 17, 2023

veshij commented Apr 22, 2023

heybronson commented May 26, 2023

github-actions bot commented Jul 26, 2023

github-actions bot commented Aug 9, 2023

jayanthvn commented Aug 9, 2023

github-actions bot commented Oct 9, 2023

joshfrench commented Oct 10, 2023

OverStruck commented Nov 17, 2023

jdn5126 commented Nov 22, 2023

github-actions bot commented Jan 22, 2024

jayanthvn commented Jan 22, 2024

jdn5126 commented Jan 30, 2024

[driver] refactor pod network configuration not to use static ARP entry #2118

[driver] refactor pod network configuration not to use static ARP entry #2118

Conversation

veshij commented Oct 25, 2022 • edited Loading

Automation added to e2e:

olemarkus commented Nov 5, 2022

veshij commented Nov 14, 2022

jdn5126 commented Nov 15, 2022

veshij commented Nov 15, 2022

jaydeokar commented Nov 18, 2022

jaydeokar commented Nov 29, 2022

veshij commented Nov 29, 2022 • edited Loading

jaydeokar commented Nov 29, 2022

jdn5126 commented Nov 30, 2022

github-actions bot commented Feb 8, 2023

github-actions bot commented Feb 23, 2023

veshij commented Feb 23, 2023

benedikt-bartscher commented Mar 13, 2023

Deshke commented Apr 17, 2023

jdn5126 commented Apr 17, 2023

veshij commented Apr 22, 2023

heybronson commented May 26, 2023

github-actions bot commented Jul 26, 2023

github-actions bot commented Aug 9, 2023

jayanthvn commented Aug 9, 2023

github-actions bot commented Oct 9, 2023

joshfrench commented Oct 10, 2023

OverStruck commented Nov 17, 2023

jdn5126 commented Nov 22, 2023

github-actions bot commented Jan 22, 2024

jayanthvn commented Jan 22, 2024

jdn5126 commented Jan 30, 2024

veshij commented Oct 25, 2022 •

edited

Loading

veshij commented Nov 29, 2022 •

edited

Loading