-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
K8s service calls timing out, possibly due to suspicious FLANNEL-POSTRTG rule #1703
Comments
If you do
From your output I don't see the mark match on the first rule maybe the issue is related to that. |
An easy way I found to stop flannel from undoing any iptables changes you're doing while testing is to send the flanneld process on the host a SIGSTOP (kill -19) make your iptables changes. Don't forget to send a SIGCONT (kill -18) to resume. |
That's correct, @rbrtbnfgl. I've never seen
It's possible but unlikely
|
I just noticed that each of my three K8s worker nodes has a
Each of the four servers (10.244.x.0) is listed. But I DON'T see these routes on my K8s control plane node, which is where I've been running the
I could try attaching my control plane node as a fourth worker node to see whether this fixes the routing table. But is there another approach? (Or is it possibly correct that the control plane node lacks a This reminds me that days ago I read about a user with a similar problem that seemed to resolve itself after (s)he added the control plane node to the worker node pool. I don't remember where I read that or even whether it was a Flannel user, but my control plane routing table lacks the |
I decided to try to refresh everything to clear out possibly stale stuff by running
And then it worked!!!
|
Thanks for your comments and help, @oe-hbk and @rbrtbnfgl. Looks like recreating all my pods fixed my problem. |
Good to hear that. Sorry if I couldn't reply yesterday. |
Flannel seems to handle pod-to-pod communication fine but calls to K8s services are timing out, though the services are up and running and firewalls are not blocking the service ports.
Expected Behavior
I built a Kubernetes cluster from four Ubuntu 22.04 servers. Most stuff is working fine, but a few things that seem to involve leaving or entering the Flannel pod network are messed up. Two specifics:
After installing metrics-server, I expected it to work.
When I run
kubectl create -f my-cnpg-cluster.yaml
after successfully installing CloudNativePG using this manifest, I expect to create a PostgreSQL cluster or see an error relating to CloudNativePG.Current Behavior
metrics-server
didn't work properly until I ran it with the non-defaulthostNetwork: true
setting, as suggested here. That reported issue (which I believe was identical to mine) involved a timeout trying to reach a K8s service running on port 443:Though I hacked a workaround by switching to
hostNetwork: true
, I believe I should have been able to communicate with the K8s service without switching network modes.kubectl create -f my-cnpg-cluster.yaml
times out attempting to reach the CloudNativePG K8s webhook service:I know the service is running because I can
port-forward
it:Possible Solution
I posted in a different flannel issue my reasons for suspecting the problem is the first rule in the
FLANNEL-POSTRTG
chain:I have twice tried adding the rule at the end and removing it from the beginning, but something (Flannel, I suppose) keeps recreating it at the top. I do this:
Then I somehow wind up with this:
Per @rbrtbnfgl's suggestion, I used
-vL
right before and right after runningkubectl create -f my-cnpg-cluster.yaml
, then diff-ed the output, which should show which rules my packets are hitting:The
FLANNEL-POSTRTG
diff seems to show that ALL packets are matching that firstFLANNEL-POSTRTG
rule that I suspected was matching all packets, so they're not getting masqueraded. This data is consistent with my theory. I wish I could figure out how to delete that rule without it getting recreated.Steps to Reproduce (for bugs)
Too late tonight to complete this section. After @rbrtbnfgl asked me to create a new issue for this, I said I would do so tonight. Submitting what I've had time to write up. Will try to provide more details tomorrow night.
THANK YOU in advance to anyone who spends any time investigating this!
Context
Your Environment
The text was updated successfully, but these errors were encountered: