Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[driver] refactor pod network configuration not to use static ARP entry #2118

Closed
wants to merge 4 commits into from

Conversation

veshij
Copy link
Contributor

@veshij veshij commented Oct 25, 2022

What type of PR is this?
bugfix

Which issue does this PR fix:
Fixes: #2103

What does this PR do / Why do we need it:
This PR changes pod network confuguration so static ARP entry for host's veth mac address is no longer need.
Previous configuration:

  • host veth doesn't have an IP address asigned
  • pod veth has a direct route to fake router IP address
  • pod veth has default route via fake router IP address
  • fake router IP address has static arp entry in pod namespace pointed to host system's veth interface
$ ip addr sho dev enidbd62a3316e
18037: enidbd62a3316e@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether 46:a9:d3:46:96:82 brd ff:ff:ff:ff:ff:ff link-netns  cni-9fc9c11a-2553-b00c-4ba8-98228b5a846a
    inet6 fe80::98df:3bff:feb0:dcf2/64 scope link
       valid_lft forever preferred_lft forever

$ sudo ip netns exec cni-9fc9c11a-2553-b00c-4ba8-98228b5a846a ip ro
default via 169.254.1.1 dev eth0
169.254.1.1 dev eth0 scope link
$ sudo ip netns exec cni-9fc9c11a-2553-b00c-4ba8-98228b5a846a arp -n
Address                  HWtype  HWaddress           Flags Mask            Iface
169.254.1.1              ether   46:a9:d3:46:96:82   CM                    eth0

Updated configuration:

  • host veth has link-local address assigned (all interfaces have same address)
  • pod has an onlink default route to host's veth interface
$ sudo ip netns exec cni-0e566121-8e96-b032-a3e3-15ca97a87f01 ip ro
default via 169.254.1.1 dev eth0 onlink

$ ip addr sho dev enibd675d21895
18038: enibd675d21895@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether a2:f3:e2:0c:5b:8b brd ff:ff:ff:ff:ff:ff link-netns cni-978889f6-92c0-8067-14a7-f509f39fa7f9
    inet 169.254.1.1/32 scope global enibd675d21895
       valid_lft forever preferred_lft forever
    inet6 fe80::1057:1eff:fe7d:fe0b/64 scope link
       valid_lft forever preferred_lft forever

More details in #2103
TLDR: on Ubuntu 22.04+ udevd assigns permanent mac address to host system veth interface once it's moved to host network namespace. Since it happens after the pod network is configured with non-permanent mac address - pod has no network connectivity.
Why not fix existing pod network configuration?
I was not able to find a good way to monitor the udev-induced mac address change. The only option would be to add a sleep with some arbitrary timeout which either will slow-down pod allocation or will not be enough on heavy-loaded systems.

If an issue # is not available please add repro steps and logs from IPAMD/CNI showing the issue:

Testing done on this change:
Verified updated configuration works on k8s cluster running on AWS EC2.

Automation added to e2e:

Will this PR introduce any new dependencies?:
No

Will this break upgrades or downgrades. Has updating a running cluster been tested?:
Should not/No

Does this change require updates to the CNI daemonset config files to work?:
No

Does this PR introduce any user-facing change?:
It changes pod network configuration as well as adds link-local address to host system's veth interface.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@veshij veshij marked this pull request as ready for review October 25, 2022 19:05
@veshij veshij requested a review from a team as a code owner October 25, 2022 19:05
@olemarkus
Copy link

I tried running this patch (+ nftables) in our e2e setup and it seems to be working:
https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/kops/14491/pull-kops-e2e-cni-amazonvpc/1588816070515363840

@veshij
Copy link
Contributor Author

veshij commented Nov 14, 2022

@olemarkus thanks for confirming!
I'm trying to find someone who could review this PR.

@jdn5126
Copy link
Contributor

jdn5126 commented Nov 15, 2022

@veshij we are working on testing this internally. Will provide a timeframe later this week after we get this assigned

@veshij
Copy link
Contributor Author

veshij commented Nov 15, 2022

@veshij we are working on testing this internally. Will provide a timeframe later this week after we get this assigned

thanks for update!

@jaydeokar
Copy link
Contributor

@veshij we are working on testing this internally. Will provide a timeframe later this week after we get this assigned

thanks for update!

@veshij I'll be working on testing this internally. We plan to finish the internal testing by end of next week

@jayanthvn jayanthvn added this to the v1.12.1 milestone Nov 23, 2022
@jaydeokar
Copy link
Contributor

@veshij I'll be working on testing this internally. We plan to finish the internal testing by end of next week

I'm investigating few issues around Security group per pod feature which seems to be affected by this change

@veshij
Copy link
Contributor Author

veshij commented Nov 29, 2022

@jaydeokar I can help with troubleshooting if you could provide some details on the issue.

@jaydeokar
Copy link
Contributor

@veshij
There are two issues as of now that we are tracking
Ref Security Group for Pod, Security group Enforcing Mode

strict Mode

  • Pod to Pod within the same security group fails even with allow all access on the security group. In the above example, curl from one pod to another fails when it is expected to allow the traffic, since they are using same security group and no block rules are added.

standard Mode

  • Pod to Pod communication works in this mode. But communication between the pod to other host VM IPs in the cluster fails

@jdn5126
Copy link
Contributor

jdn5126 commented Nov 30, 2022

@veshij There are two issues as of now that we are tracking Ref Security Group for Pod, Security group Enforcing Mode

strict Mode

  • Pod to Pod within the same security group fails even with allow all access on the security group. In the above example, curl from one pod to another fails when it is expected to allow the traffic, since they are using same security group and no block rules are added.

standard Mode

  • Pod to Pod communication works in this mode. But communication between the pod to other host VM IPs in the cluster fails

The TL;DR here is that in SGPP strict mode, the container is unable to resolve the ARP binding for its gateway addr (link-local address). In SGPP standard mode and non-SGPP, the container gets an ARP reply mapping the link-local IP to the host-side veth MAC.

In SGPP strict mode, we configure IIF-based IP route rules: https://github.com/aws/amazon-vpc-cni-k8s/blob/master/cmd/routed-eni-cni-plugin/driver/driver.go#L557 , and we are trying to determine why the kernel does generate ARP reply in this case. Still digging...

@jdn5126 jdn5126 removed this from the v1.12.1 milestone Dec 9, 2022
@github-actions
Copy link

github-actions bot commented Feb 8, 2023

This pull request is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

@github-actions github-actions bot added the stale Issue or PR is stale label Feb 8, 2023
@github-actions
Copy link

Pull request closed due to inactivity.

@github-actions github-actions bot closed this Feb 23, 2023
@jdn5126 jdn5126 reopened this Feb 23, 2023
@veshij
Copy link
Contributor Author

veshij commented Feb 23, 2023

uh-oh, I somehow forgot about it.
Trying to repro and the issue to fix it.

@github-actions github-actions bot removed the stale Issue or PR is stale label Feb 24, 2023
@benedikt-bartscher
Copy link

Any news on this one?

@Deshke
Copy link

Deshke commented Apr 17, 2023

any update on this ? @jdn5126 @veshij

@jdn5126
Copy link
Contributor

jdn5126 commented Apr 17, 2023

any update on this ? @jdn5126 @veshij

I am going to have to defer to @veshij here. As it stands, we cannot merge this until the SGPP strict mode issue is understood

@veshij
Copy link
Contributor Author

veshij commented Apr 22, 2023

ugh, totally forgot about this one.
I'll be on PTO next week, will take a look after that.

@heybronson
Copy link

@veshij any updates here? Can this be merged?

@github-actions
Copy link

This pull request is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

@github-actions github-actions bot added the stale Issue or PR is stale label Jul 26, 2023
@github-actions
Copy link

github-actions bot commented Aug 9, 2023

Pull request closed due to inactivity.

@github-actions github-actions bot closed this Aug 9, 2023
@jayanthvn jayanthvn reopened this Aug 9, 2023
@jayanthvn
Copy link
Contributor

Not stale..need to debug the SGPP failures.

@github-actions
Copy link

github-actions bot commented Oct 9, 2023

This pull request is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

@github-actions github-actions bot added the stale Issue or PR is stale label Oct 9, 2023
@joshfrench
Copy link

Just commenting to keep this alive, we are very interested in seeing this fixed. Thank you!

@jdn5126 jdn5126 removed the stale Issue or PR is stale label Oct 10, 2023
@jdn5126 jdn5126 removed the request for review from achevuru October 10, 2023 17:14
@OverStruck
Copy link

bump

@jdn5126
Copy link
Contributor

jdn5126 commented Nov 22, 2023

I picked this up again, and I still cannot determine why in SGPP strict mode, the pod is unable to ARP for its host veth:

$ sudo tcpdump -vvven -i vlan873aabf4bea
tcpdump: listening on vlan873aabf4bea, link-type EN10MB (Ethernet), capture size 262144 bytes
20:07:16.391175 06:59:cf:7d:01:cb > Broadcast, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 169.254.1.1 tell 192.168.65.161, length 28

In SGPP strict, we have a rule for pod traffic to force everything through the branch ENI:

$ ip rule show
10:     from all iif vlan873aabf4bea lookup 110
20:     from all lookup local

And that routing table is:

$ ip route show table 101
default via 192.168.64.1 dev vlan.eth.1
192.168.64.1 dev vlan.eth.1 scope link
192.168.86.180 dev vlan605fbc0e45b scope link

The host veth still has 169.254.1.1 assigned:

$ ifconfig vlan873aabf4bea
vlan873aabf4bea: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9001
        inet 169.254.1.1  netmask 255.255.255.255  broadcast 0.0.0.0
        inet6 fe80::d839:a9ff:fee7:9ade  prefixlen 64  scopeid 0x20<link>
        ether da:39:a9:e7:9a:de  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

And I don't think any sysctls are preventing this from succeeding. Still digging

Copy link

This pull request is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

@github-actions github-actions bot added the stale Issue or PR is stale label Jan 22, 2024
@jayanthvn
Copy link
Contributor

/not stale

@github-actions github-actions bot removed the stale Issue or PR is stale label Jan 23, 2024
@jdn5126 jdn5126 closed this Jan 30, 2024
@jdn5126
Copy link
Contributor

jdn5126 commented Jan 30, 2024

Closing this pull request as we cannot implement this approach without breaking the Security Groups for Pods architecture. The recommendation for host operating systems is to set MACAddressPolicy=none in /usr/lib/systemd/network/99-default.link, as the official EKS AMI does for AL2023: https://github.com/awslabs/amazon-eks-ami/blob/master/scripts/install-worker.sh#L104

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pods cannot talk to cluster IPs on Ubuntu 2204
10 participants