-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KubeSpray - Cannot access local node services when using eBPF #7252
Comments
Sorry for following late, missed this issue, have you made any progress? Do I understand correctly that you have a nodeport 9100 and you cannot access it via that nodeport? Of you have just a process listening on each node on port 9100? And trying to connect to the same node's IP on that port from a local pod does not work? |
I did a test again. First, I ran a nginx pod on worker2. Everything is done. After that, I entered nginx pod and run "curl http://worker2:9999". It doesn't work. |
Same issue here with calico version 3.25.1 eBPF (Kubernetes in amd64, Ubuntu 22.04 LTS nodes). |
@tomastigera |
@spantazi thank you, if you have troubles getting the logs, we could connect on calico users slack #ebpf and I can help out with that. |
I am so glad to hear someone has the same problem with me. My k8s is also built up by Kubespray v2.22 (.k8s 1.26.5), and use calico 3.25.1. |
@tomastigera I follow your instruction to get logs. But it is too long, I just post a partial logs here. Logs <Part 1>
Logs <Part 2>
Is it enough for debugging? (Sorry, I don't know how to upload the complete logs here.) I use Calico 3.25.1 and 3.25.2 to test this issue. Both of them have the same problem. |
Consider uploading the file as a Gist at https://gist.github.com/ and link it here? |
Sorry, I can't. My network is controlled by my company. I can't upload any file to other sites. |
Can you cut-and-paste it in? |
No, it totally has 15548 lines. It's too long. However, I see "DENY due to policy" in log messages. Is it possible related to #7707 ? But I also upgrade to Calico v3.25.2, this issue still exists. |
@Aslan-Liu thanks for the logs. It is a firehose, but that is going to change with 3.27 😅 You said that the port 9999 is a nodeport. But from the logs is does not seem like it is treated like one:
Could you share output from
We are looking for entries with port 9999, specially As for the |
@tomastigera Actually, port 9999 is not a node port. I just run a http server on the host and listen port 9999, so I think we also cannot see port 9999 in the output of 'kubectl exec -n calico-system calico-node-XYZ -- calico-node -bpf nat dump'. Am I right? Or do you still need the output of 'kubectl exec -n calico-system calico-node-XYZ -- calico-node -bpf nat dump'? Or other information do you need? Just let me know. Thanks |
Ohhh, sorry for misunderstanding - service is a super overloaded term 🤷♂️ In that case, the nat dump is pointless. What you can do though is to dump the policy on that interface using |
@tomastigera Here are the output from my environment
Above information are dumped from calico-node pod which is running on Host 172.21.147.156. Could you see anything wrong here? |
Oops I meant Nevertheless, it does not matter, there is basically no policy, right? 🤔 |
Yes, sir. This is a new cluster. I did not add any network policies there. |
What if you set config option |
I modified it and it still not works. Here is my Felix config
|
@Aslan-Liu I have a cluster with calico 3.25.2. I created a nginx nodeport service. Further I created a test pod (ubuntu) on one of the nodes and tried to access the service using the nodeIP:nodeport where nodeIP is the IP of the node on which test pod is running. I can see the connectivity working fine. Am I missing something? |
|
I have
|
@sridhartigera Thanks. Yes, the test you did is correct. But the data plan network you used is eBPF? And my OS is Ubuntu 22.04. I am not sure if this may be a problem. |
@Aslan-Liu Yes. Dataplane is eBPF. This is the felix config
I can give it a try with ubuntu 22.04. |
@sridhartigera I checked my FelixConfiguration with yours and find two differences. Cloud you help me to check if these differences will cause this issue? First, my ipipEnabled is false, but yours are true. Which one may cause this issue? Here is my FelixConfiguration
|
@Aslan-Liu Can you please try setting |
@tomastigera Thanks. At least, I know I have to wait these bugs are fixed. I will keep watching this issue until it is fixed. Thank you. |
In eBPF mode I can't ping the (any) node IP from within a POD. I can ping other PODs (both ipv4 and ipv6). Is that caused by the same problem as in this issue? Calico v1.27.0, install with UpdateI disabled |
I can ping nodes from pods, except the local node when the pods is in nat-outgoing pool, which is the same issue as here and fixed by #8380
I do not see how kube-proxy rules would be related unless you ping a service. This likely a different issue. |
…ndpoint If there is no wildcard HEP, there is no policy that should be applied. But without skipping, empty list of profiles would create a default deny rule if none of the non-existent profiles matches. That is obviously always hit and traffic toward the host is dropped if defaultEndpointToHostAction is set to RETURN. fixes projectcalico#7252
It's because And you can't ping a service, at least not with But, you are right, my problem is likely a different issue. |
…ndpoint If there is no wildcard HEP, there is no policy that should be applied. But without skipping, empty list of profiles would create a default deny rule if none of the non-existent profiles matches. That is obviously always hit and traffic toward the host is dropped if defaultEndpointToHostAction is set to RETURN. fixes projectcalico#7252
…ndpoint If there is no wildcard HEP, there is no policy that should be applied. But without skipping, empty list of profiles would create a default deny rule if none of the non-existent profiles matches. That is obviously always hit and traffic toward the host is dropped if defaultEndpointToHostAction is set to RETURN. fixes projectcalico#7252
When a pod is accessing a local host, it should not get SNATed as the host when it is in a nat-outgoing ippool. (a) it is unnecessary as the local node can be accessed and (b) there is no way to return the traffic as is it would return to the host itself. refs projectcalico#7252
If there is no wildcard HEP, there is no policy that should be applied. But without skipping, empty list of profiles would create a default deny rule if none of the non-existent profiles matches. That is obviously always hit and traffic toward the host is dropped if defaultEndpointToHostAction is set to RETURN. fixes projectcalico#7252
Wait, there isn't a release yet that includes this code change >:| Last release was Dec 15 2023 which is definitely more than 2 weeks ago. How long before this is baked into a release? |
It will be part of the upcoming 3.27.1 - soon! |
@tomastigera yay! |
When a pod is accessing a local host, it should not get SNATed as the host when it is in a nat-outgoing ippool. (a) it is unnecessary as the local node can be accessed and (b) there is no way to return the traffic as is it would return to the host itself. refs projectcalico#7252
…ndpoint If there is no wildcard HEP, there is no policy that should be applied. But without skipping, empty list of profiles would create a default deny rule if none of the non-existent profiles matches. That is obviously always hit and traffic toward the host is dropped if defaultEndpointToHostAction is set to RETURN. fixes projectcalico#7252
@BloodyIron the fix is released, it is actually 3.27.2 |
@tomastigera sorry for the delay in my response, life stuff. I genuinely appreciate you directly tagging me, as on my end my need for eBPF via Calico is particularly important. I'm trying to solve a SourceIP problem, and I'm hoping this does the trick. Anyways, just letting you know I'm now trying to make the time for this topic, and will aspire to get back to you with my results. Again, appreciate your help on this ❤️ |
@tomastigera due to certain reasons I may need to try addressing my SourceIP need with calico-[kube-controllers|node] v3.22.5 and not the much newer v3.27.2. Namely because I need to upgrade multiple aspects to even reach v3.27.2 capabilities (I THINK, could be wrong), and I'm trying to find a solution that's "good enough" for now to get SourceIP while kube-proxy is active (RKE1) in such a way that I do not see a way to disable kube-proxy fully. I'm currently going to try and get SourceIP fixed with eBPF with Calico-stuff v3.22.5, and that might be a mistake, but I'm going to find out. I have a pretty substantial technical debt on my end which makes me very reluctant to address these upgrades that are blocking at this time, but I know I will need to overcome them in the near future. Just wanted to share. |
Okay so the original thing that brought me to this particular GitHub Issue thread is when I try to enable eBPF I get a failure that looks to be related to the original issue in this thread: (this is log output from calico-node pods after enabling eBPF)
And a few days ago I found that maybe there's a solution to this that's been rolled out in Calico v3.23.1 : #6056 So I'm not sure if v3.27.2 is necessarily relevant to my scenario. As such I've been trying to upgrade my calico to v3.23.3, and I'm probably doing something wrong in that process (failing along the way), but that's what I'm working on, and the context of why. Maybe it'll help someone to know. I'll try to report back if I succeed in the upgrade, and if it solves my SourceIP problem. Appreciate the help thus-far though, thanks! :) |
So in one of my dev k8s clusters I've switched to RKE2... guess which version of Calico it's feeding me? ;P v3.27.2! So I'm likely to try Calico in eBPF mode again, but with RKE2 I can actually properly disable kube-proxy for it. |
I have one cluster and also install Prometheus in it. So, each node has a service (HostIP:9100) to export node information. However, if I run one Pod on Node1(Node1 Host IP: 172.21.149.119), I cannot access 172.21.149.119:9100 in the Pod. But I can access other services on the other nodes, such as 172.21.149.xx:9100 .
Expected Behavior
All local serivces on every node can be accessed in the Pod.
Current Behavior
Now, only local services run on different nodes with my Pod can be accessed in my Pod.
Possible Solution
#6065
Steps to Reproduce (for bugs)
Context
I also can see the following log messages in calico-node Pod.
I think this problem is very similar as #6065. However, the problem still exists after upgrading Calico to v3.23.1.
Your Environment
Someone can help me?
The text was updated successfully, but these errors were encountered: