-
Notifications
You must be signed in to change notification settings - Fork 741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pods cannot talk to cluster IPs on Ubuntu 2204 #2103
Comments
This looks similar to this issue - #1847 (comment). This workaround - #1847 (comment) has helped. As suggested by @achevuru - Amazon Linux 2 images use iptables-legacy by default as well. We will check and update if there is something we can do to address this scenario. |
Let me try that workaround. If it works, I think it would be helpfull if an iptables-nft image could be published. I imagine it wouldn't be too much work to do that. |
Thanks, please let us know if it works. |
Unfortunately, no luck. The workaround does remove the rules from iptables-legacy and I do see them now in nftables. Pods still cannot talk to cluster IPs. Can also confirm I still see nothing interesting in the logs, and Pods do get their IPs. |
Any idea how to progress on this? Anywhere we should look for potential issues? |
@olemarkus - Sorry for the delay. Since you mentioned pod to pod communication is broken. Wondering if you already verified this, if not can you please run tcp dump on the sender pod host side veth, sender node, receiving node and receiving node host side veth? This should provide context on where the traffic is getting dropped. |
As far as I can tell, pod to pod comms works when they are interacting directly. It's pod -> clusterIP that does not work except when the Pod is running in hostNetworking mode. |
I expect Were you able to track the packet via |
Right. Running tcp dump against A's veth, I see the packets going from pod IP to the service IP. However, tcpdumping any host interfaces, shows no packets coming in from A's IP. This is with a custom build of aws-node using NFT iptables. The DNAT rules seems to be working fine since connecting from the host towards the cluster IP works. |
So, if I understood it right - Connection to a Pod via ClusterIP from a pod without hostNetworking fails but the same works the other way around? When you say tcpdump against Pod's veth - Are you referring to the veth interface inside the pod's network namespace (or) the veth interface on the host network namespace? Would you be able to share |
I have not tested from B to A. Also worth mentioning that this is very easy to reproduce with latest kOps using e.g |
Interesting, if you see the packet on host side veth then we know it landed on the host network end and now the behavior should be similar to a connection that we initiate from the node. Are there any active network policies on the node? We will check the logs/iptables output once we receive them and will update here. We will see if we can reproduce with the above image as well. |
Sent the iptables output. Also tried disabling rp filtering on the veth interface, but didn't seem to have much effect. There are no network policies or similar on the node other than what ubuntu 2204 may be doing by default. |
troubleshooting similar issue on our cluster. Configuration which works on u20 doesn't work after an upgrade to U22. |
I think I found the issue. |
@veshij VPC CNI does add a static arp entry for default GW (169.254.1.1 - pointing to host side veth) inside the pod network namespace. So, it is essentially for the host side veth ..
Are you saying the packet is dropped at host veth because of L2 header discrepancy (i.e.,) mismatch with host veth's MAC? As you can see, we derive the hostVeth MAC and are using it...So, the veth MAC must be changing. We can see the veth MAC inside the pod network namespace and compare it against the current value..
|
Yes, that's exactly what happens in my system. I can confirm that on u22 (running newer kernel) the mac address of host's veth doesn't match static arp record inside the pod. And more to say - this mac address is not used on any other interface. Exactly the same cni binary running on u20 (and older kernel) has no issues. I'm troubleshooting it a bit further. I don't think it's a bug in CNI code, currently I'm suspecting either an issue with netlink implementation/kernel netlink interface or mac address changes over time on veth interface (smth similar to ipv6's privacy extensions).
|
Testcase triggers the issue both in AWS and on-prem, kernel 5.15.
|
Looks like it's udev.
https://www.freedesktop.org/software/systemd/man/systemd.link.html
u20:
u22:
with We likely want to fix implementation on cni side, I suppose changing order to creatre veth pair in root namespace first and moving device to netns should be a reasonable workaround. |
@jayanthvn @achevuru what do you think? More conventional approach:
Another option is to leave almost everything as is:
Unfortunately I'm not sure how to make it work without some sleep with magic duration (it takes a 100-200ms on my system, but it can be worse if host is heavily loaded). |
|
|
FWIW, I found this workaround during node setup helped resolve the issue:
|
Thanks @kwohlfahrt The proposed PR #2118 changes the order of CNI to creatre veth pair in root namespace and then move device to netns. |
Was this resolved? |
/reopen |
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days |
This still needs a fix |
It also prevents new kops cluster with networking=amazonvpc to come up healthy. In my case core-dns-xx and ebs-csi-node pods kept crashing. For core-dns the log read: plugin/error timeout when trying to connect to Amazon Provided DNS server. For ebs-csi-node the error was related to unable to get the Node (was trying on 100.64. - not sure why). The workaround is to use 20.04 image instead. The error messages are so cryptic that it took me a while to figure out. |
So running in AWS with spec.networking.amazonvpc and also using awsEBSCSIDriver is broken? Trying to upgrade my test cluster from 1.25 to 1.26 and the ebs-sci-node pod's ebs-plugin container in the new masters keep crash-looping with this log
Running amazonvpc networking with ebs-csi seems like a pretty common use case to be so broken. |
@pmankad96 @btalbot I suggest filing a support case for this so that it can be investigated further |
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days |
I haven't seen any comments or commits on this so I presume that Ubuntu 2204 is still broken on AWS running amazonvpc? |
Not yet 🥲 |
@btalbot Ubuntu 22.04 works on EKS, you just have to set |
Closing this as complete, since the troubleshooting doc informs people to set |
This issue is now closed. Comments on closed issues are hard for our team to see. |
What happened:
After upgrading clusters to use Ubuntu 22.04 by default, the kOps e2e tests started failing for this CNI: https://testgrid.k8s.io/kops-network-plugins#kops-aws-cni-amazon-vpc
What seems to happen is that Pods do receive IPs, but they fail to talk across nodes. Calling e.g a ClusterIP service from the host works, but not from a Pod. Kube-proxy therefore should be working just fine.
I cannot see anything wrong in any logs. But what I do see is that there are AWS-related rules in the legacy iptables, while kube-proxy uses nftables. So my guess is that this is the cause of this behavior. nft and legacy iptables must not be mixed anyway.
Attach logs
Example logs here: https://gcsweb.k8s.io/gcs/kubernetes-jenkins/logs/e2e-kops-aws-cni-amazon-vpc/1577618499142946816/artifacts/i-0d90e121da8bff687/
How to reproduce it (as minimally and precisely as possible):
The text was updated successfully, but these errors were encountered: