K8s service calls timing out, possibly due to suspicious FLANNEL-POSTRTG rule #1703

JamesLavin · 2023-01-04T05:21:01Z

Flannel seems to handle pod-to-pod communication fine but calls to K8s services are timing out, though the services are up and running and firewalls are not blocking the service ports.

Expected Behavior

I built a Kubernetes cluster from four Ubuntu 22.04 servers. Most stuff is working fine, but a few things that seem to involve leaving or entering the Flannel pod network are messed up. Two specifics:

After installing metrics-server, I expected it to work.
When I run kubectl create -f my-cnpg-cluster.yaml after successfully installing CloudNativePG using this manifest, I expect to create a PostgreSQL cluster or see an error relating to CloudNativePG.

Current Behavior

metrics-server didn't work properly until I ran it with the non-default hostNetwork: true setting, as suggested here. That reported issue (which I believe was identical to mine) involved a timeout trying to reach a K8s service running on port 443:

Error: Get https://10.96.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 10.96.0.1:443: i/o timeout

Though I hacked a workaround by switching to hostNetwork: true, I believe I should have been able to communicate with the K8s service without switching network modes.

Instead of creating a cluster (or returning a CloudNativePG-related error), kubectl create -f my-cnpg-cluster.yaml times out attempting to reach the CloudNativePG K8s webhook service:

Error from server (InternalError): error when creating "my-cnpg-cluster.yaml": Internal error occurred: failed calling webhook "mcluster.kb.io": failed to call webhook: Post "https://cnpg-webhook-service.cnpg-system.svc:443/mutate-postgresql-cnpg-io-v1-cluster?timeout=10s": context deadline exceeded

I know the service is running because I can port-forward it:

user@server:~$ k port-forward service/cnpg-webhook-service -n cnpg-system 9443:443
Forwarding from 127.0.0.1:9443 -> 9443

Possible Solution

I posted in a different flannel issue my reasons for suspecting the problem is the first rule in the FLANNEL-POSTRTG chain:

$ sudo iptables -t nat -L
...
Chain FLANNEL-POSTRTG (1 references)
target     prot opt source               destination
RETURN     all  --  anywhere             anywhere             /* flanneld masq */
RETURN     all  --  10.244.0.0/16        10.244.0.0/16        /* flanneld masq */
MASQUERADE  all  --  10.244.0.0/16       !base-address.mcast.net/4  /* flanneld masq */ random-fully
RETURN     all  -- !10.244.0.0/16        williams/24          /* flanneld masq */
MASQUERADE  all  -- !10.244.0.0/16        10.244.0.0/16        /* flanneld masq */ random-fully

I have twice tried adding the rule at the end and removing it from the beginning, but something (Flannel, I suppose) keeps recreating it at the top. I do this:

sudo iptables -t nat -I FLANNEL-POSTRTG 6 -s 0.0.0.0/0 -d 0.0.0.0/0 -j RETURN -m comment --comment "flannel
d masq"
sudo iptables -t nat -D FLANNEL-POSTRTG 1

Then I somehow wind up with this:

Chain FLANNEL-POSTRTG (1 references)
num  target     prot opt source               destination
1    RETURN     all  --  0.0.0.0/0            0.0.0.0/0            /* flanneld masq */
2    RETURN     all  --  0.0.0.0/0            0.0.0.0/0            /* flanneld masq */
3    RETURN     all  --  10.244.0.0/16        10.244.0.0/16        /* flanneld masq */
4    MASQUERADE  all  --  10.244.0.0/16       !224.0.0.0/4          /* flanneld masq */ random-fully
5    RETURN     all  -- !10.244.0.0/16        10.244.0.0/24        /* flanneld masq */
6    MASQUERADE  all  -- !10.244.0.0/16        10.244.0.0/16        /* flanneld masq */ random-fully

Per @rbrtbnfgl's suggestion, I used -vL right before and right after running kubectl create -f my-cnpg-cluster.yaml, then diff-ed the output, which should show which rules my packets are hitting:

3c3
< 1     208K   28M KUBE-SERVICES  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
---
> 1     209K   29M KUBE-SERVICES  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
14c14
< 1      413 25422 FLANNEL-POSTRTG  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* flanneld masq */
---
> 1      672 41260 FLANNEL-POSTRTG  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* flanneld masq */
20c20
< 2       87  5220 RETURN     all  --  *      *       10.244.0.0/16        10.244.0.0/16        /* flanneld masq */
---
> 2      141  8460 RETURN     all  --  *      *       10.244.0.0/16        10.244.0.0/16        /* flanneld masq */
23c23
< 5        4   240 MASQUERADE  all  --  *      *      !10.244.0.0/16        10.244.0.0/16        /* flanneld masq */ random-fully
---
> 5        7   420 MASQUERADE  all  --  *      *      !10.244.0.0/16        10.244.0.0/16        /* flanneld masq */ random-fully
41c41
< 1     2902  181K RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000/0x4000
---
> 1     3158  196K RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000/0x4000
197c197
< 23    3298  277K KUBE-NODEPORTS  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL
---
> 23    3594  302K KUBE-NODEPORTS  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL

The FLANNEL-POSTRTG diff seems to show that ALL packets are matching that first FLANNEL-POSTRTG rule that I suspected was matching all packets, so they're not getting masqueraded. This data is consistent with my theory. I wish I could figure out how to delete that rule without it getting recreated.

Steps to Reproduce (for bugs)

Too late tonight to complete this section. After @rbrtbnfgl asked me to create a new issue for this, I said I would do so tonight. Submitting what I've had time to write up. Will try to provide more details tomorrow night.

THANK YOU in advance to anyone who spends any time investigating this!

Context

Your Environment

Flannel version: v0.20.2
Backend used (e.g. vxlan or udp): vxlan
Etcd version: 3.3 (I believe)
Kubernetes version (if used): v1.26.0
Operating System and version: Ubuntu 22.04 (kernel version 5.15.0-56-generic)

The text was updated successfully, but these errors were encountered:

rbrtbnfgl · 2023-01-04T14:11:39Z

If you do sudo iptables -t nat -n -vL what was the output for the FLANNEL-POSTRTG chain?
it should be something like this

Chain FLANNEL-POSTRTG (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            mark match 0x4000/0x4000 /* flanneld masq */
  102  6120 RETURN     all  --  *      *       10.233.64.0/18       10.233.64.0/18       /* flanneld masq */
    2   192 MASQUERADE  all  --  *      *       10.233.64.0/18      !224.0.0.0/4          /* flanneld masq */ random-fully
    0     0 RETURN     all  --  *      *      !10.233.64.0/18       10.233.64.0/24       /* flanneld masq */
    0     0 MASQUERADE  all  --  *      *      !10.233.64.0/18       10.233.64.0/18       /* flanneld masq */ random-fully

From your output I don't see the mark match on the first rule maybe the issue is related to that.
Could you try to check the output of iptables-save?

oe-hbk · 2023-01-04T14:36:27Z

An easy way I found to stop flannel from undoing any iptables changes you're doing while testing is to send the flanneld process on the host a SIGSTOP (kill -19) make your iptables changes. Don't forget to send a SIGCONT (kill -18) to resume.

JamesLavin · 2023-01-04T14:37:26Z

That's correct, @rbrtbnfgl. I've never seen mark match ... on that first rule. This is what I currently see:

Chain FLANNEL-POSTRTG (1 references)
num   pkts bytes target     prot opt in     out     source               destination
1        0     0 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* flanneld masq */
2    32298 1938K RETURN     all  --  *      *       10.244.0.0/16        10.244.0.0/16        /* flanneld masq */
3        0     0 MASQUERADE  all  --  *      *       10.244.0.0/16       !224.0.0.0/4          /* flanneld masq */ random-fully
4        0     0 RETURN     all  --  *      *      !10.244.0.0/16        10.244.0.0/24        /* flanneld masq */
5     1348 80880 MASQUERADE  all  --  *      *      !10.244.0.0/16        10.244.0.0/16        /* flanneld masq */ random-fully

It's possible but unlikely mark match ... was present before I tried creating the rule at the end of the chain and deleting it from the start of the chain, but I never noticed it. I recorded the following on Monday, before I started mucking with the chain:

Chain FLANNEL-POSTRTG (1 references)
num  target     prot opt source               destination
1    RETURN     all  --  0.0.0.0/0            0.0.0.0/0            /* flanneld masq */
2    RETURN     all  --  10.244.0.0/16        10.244.0.0/16        /* flanneld masq */
3    MASQUERADE  all  --  10.244.0.0/16       !224.0.0.0/4          /* flanneld masq */ random-fully
4    RETURN     all  -- !10.244.0.0/16        10.244.0.0/24        /* flanneld masq */
5    MASQUERADE  all  -- !10.244.0.0/16        10.244.0.0/16        /* flanneld masq */ random-fully

JamesLavin · 2023-01-04T22:10:48Z

I just noticed that each of my three K8s worker nodes has a flannel.1 route in its routing table that appears to route traffic either to itself or over the cni0 bridge to another node:

$ ip route
default via 192.168.2.1 dev enp3s0 proto static metric 100
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink
10.244.1.0/24 dev cni0 proto kernel scope link src 10.244.1.1
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink
10.244.3.0/24 via 10.244.3.0 dev flannel.1 onlink
169.254.0.0/16 dev enp3s0 scope link metric 1000
192.168.2.0/24 dev enp3s0 proto kernel scope link src 192.168.2.101 metric 100

Each of the four servers (10.244.x.0) is listed.

But I DON'T see these routes on my K8s control plane node, which is where I've been running the kubectl commands that have been timing out:

....$ ip route
default via 192.168.2.1 dev enp2s0 proto static metric 100
10.244.0.0/24 dev cni0 proto kernel scope link src 10.244.0.1
169.254.0.0/16 dev enp2s0 scope link metric 1000
192.168.2.0/24 dev enp2s0 proto kernel scope link src 192.168.2.107 metric 100

I could try attaching my control plane node as a fourth worker node to see whether this fixes the routing table. But is there another approach? (Or is it possibly correct that the control plane node lacks a flannel.1 route???)

This reminds me that days ago I read about a user with a similar problem that seemed to resolve itself after (s)he added the control plane node to the worker node pool. I don't remember where I read that or even whether it was a Flannel user, but my control plane routing table lacks the flannel.1 routes, which could potentially explain the other user's problems disappearing after (s)he joined the control plane node as an extra worker node.

JamesLavin · 2023-01-05T01:32:12Z

I decided to try to refresh everything to clear out possibly stale stuff by running kubectl delete pod --all --all-namespaces and now I see all four servers with flannel.1 routes in my control plane node:

$ ip route
default via 192.168.2.1 dev enp2s0 proto static metric 100
10.244.0.0/24 dev cni0 proto kernel scope link src 10.244.0.1 linkdown
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink
10.244.3.0/24 via 10.244.3.0 dev flannel.1 onlink
169.254.0.0/16 dev enp2s0 scope link metric 1000
192.168.2.0/24 dev enp2s0 proto kernel scope link src 192.168.2.107 metric 100

And then it worked!!!

$ k apply -f mypg-cnpg-cluster.yaml
cluster.postgresql.cnpg.io/mypg created

JamesLavin · 2023-01-05T01:33:50Z

Thanks for your comments and help, @oe-hbk and @rbrtbnfgl. Looks like recreating all my pods fixed my problem.

rbrtbnfgl · 2023-01-05T09:27:55Z

Good to hear that. Sorry if I couldn't reply yesterday.
Only a check that you could do which version of containerd are you using?
I don't know if it's related but there is a bug with newer version (from 1.6.9) and the pods with an host-ip that happens when you restarts the cluster.

rbrtbnfgl mentioned this issue Jan 4, 2023

Pod can't access clusterip service for another pod with endpoint on the same node #1702

Closed

JamesLavin closed this as completed Jan 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

K8s service calls timing out, possibly due to suspicious FLANNEL-POSTRTG rule #1703

K8s service calls timing out, possibly due to suspicious FLANNEL-POSTRTG rule #1703

JamesLavin commented Jan 4, 2023 •

edited

Loading

rbrtbnfgl commented Jan 4, 2023 •

edited

Loading

oe-hbk commented Jan 4, 2023

JamesLavin commented Jan 4, 2023

JamesLavin commented Jan 4, 2023

JamesLavin commented Jan 5, 2023 •

edited

Loading

JamesLavin commented Jan 5, 2023

rbrtbnfgl commented Jan 5, 2023 •

edited

Loading

K8s service calls timing out, possibly due to suspicious FLANNEL-POSTRTG rule #1703

K8s service calls timing out, possibly due to suspicious FLANNEL-POSTRTG rule #1703

Comments

JamesLavin commented Jan 4, 2023 • edited Loading

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

rbrtbnfgl commented Jan 4, 2023 • edited Loading

oe-hbk commented Jan 4, 2023

JamesLavin commented Jan 4, 2023

JamesLavin commented Jan 4, 2023

JamesLavin commented Jan 5, 2023 • edited Loading

JamesLavin commented Jan 5, 2023

rbrtbnfgl commented Jan 5, 2023 • edited Loading

JamesLavin commented Jan 4, 2023 •

edited

Loading

rbrtbnfgl commented Jan 4, 2023 •

edited

Loading

JamesLavin commented Jan 5, 2023 •

edited

Loading

rbrtbnfgl commented Jan 5, 2023 •

edited

Loading