IPv6 masquerade NAT rules missing in dual-stack installation #4683

meis4h · 2021-12-08T15:27:02Z

Environmental Info:
K3s Version:

k3s version v1.22.4+k3s1 (bec170bc)
go version go1.16.10

Node(s) CPU architecture, OS, and Version:
Linux me-k3sv6 4.18.0-348.2.1.el8_5.x86_64 #1 SMP Mon Nov 15 20:49:28 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
RockyLinux 8.5 with a single network adapter with an ipv4 and ipv6 address assigned
I have verified the same behavior on Debian 11 too

Cluster Configuration:
Single node installation without any additional agents.

Describe the bug:
When creating a cluster using the dual-stack options described in the docs, any outgoing pod traffic is exiting the node with the ipv6 address of the pod instead of the address of the nodes interface.
Looks like the NAT rules in nftables which is responsible for translating the pod addresses are only being created for ipv4 addresses and not for ipv6.
Replicating the existing ipv4 rules to ipv6 by hand fixes the issue.

Im not sure if this is a bug or by design but I would expect outgoing traffic to work the same for ipv4 and ipv6 by default.

Steps To Reproduce:

Install K3s with dual-stack options:

curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=v1.22 sh -s - \
--disable-network-policy \
--node-ip="172.30.80.172,2001:XXX:XXX:80::172" \
--cluster-cidr="10.42.0.0/16,fc15:1::/56" \
--service-cidr="10.43.0.0/16,fc15:2::/112"

Create pod for testing and ping6 neighboring host which is known to be reachable via ipv6 from the host directly:

[root@me-k3sv6 ~]# kubectl run --rm -it --image=centos bash
If you don't see a command prompt, try pressing enter.
[root@bash /]# ping6 -c 10 2001:XXX:XXX:80::170
PING 2001:XXX:XXX:80::170(2001:XXX:XXX:80::170) 56 data bytes

--- 2001:XXX:XXX:80::170 ping statistics ---
10 packets transmitted, 0 received, 100% packet loss, time 9230ms

Verify that the packets arrive on the target host with the pod address using tcpdump

[root@target-host ~]# tcpdump -n -i ens192 ip6
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens192, link-type EN10MB (Ethernet), capture size 262144 bytes
16:03:24.789473 IP6 fc15:1::b > 2001:XXX:XXX:80::170: ICMP6, echo request, seq 1, length 64
16:03:24.789542 IP6 2001:XXX:XXX:80::170 > fc15:1::b: ICMP6, echo reply, seq 1, length 64
16:03:25.810603 IP6 fc15:1::b > 2001:XXX:XXX:80::170: ICMP6, echo request, seq 2, length 64
16:03:25.810663 IP6 2001:XXX:XXX:80::170 > fc15:1::b: ICMP6, echo reply, seq 2, length 64
...

Apply missing NAT rules:

nft add rule ip6 nat POSTROUTING ip6 saddr fc15:1::/56 ip6 daddr fc15:1::/56 counter packets 81 bytes 4948 return
nft add rule ip6 nat POSTROUTING ip6 saddr fc15:1::/56 ip6 daddr != ff00::/8 counter packets 1 bytes 96 masquerade
nft add rule ip6 nat POSTROUTING ip6 saddr != fc15:1::/56 ip6 daddr fc15:1::/64 counter packets 0 bytes 0 return
nft add rule ip6 nat POSTROUTING ip6 saddr != fc15:1::/56 ip6 daddr fc15:1::/56 counter packets 0 bytes 0 masquerade

Verify ipv6 connection is working now:

[root@bash /]# ping6 -c 3 2001:XXX:XXX:80::170
PING 2001:XXX:XXX:80::170(2001:XXX:XXX:80::170) 56 data bytes
64 bytes from 2001:XXX:XXX:80::170: icmp_seq=1 ttl=63 time=0.234 ms
64 bytes from 2001:XXX:XXX:80::170: icmp_seq=2 ttl=63 time=0.289 ms
64 bytes from 2001:XXX:XXX:80::170: icmp_seq=3 ttl=63 time=0.360 ms

--- 2001:XXX:XXX:80::170 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2082ms
rtt min/avg/max/mdev = 0.234/0.294/0.360/0.053 ms

Verify address again on the target host:

[root@target-host ~]# tcpdump -n -i ens192 ip6
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens192, link-type EN10MB (Ethernet), capture size 262144 bytes
16:12:13.305330 IP6 2001:XXX:XXX:80::172 > 2001:XXX:XXX:80::170: ICMP6, echo request, seq 1, length 64
16:12:13.305429 IP6 2001:XXX:XXX:80::170 > 2001:XXX:XXX:80::172: ICMP6, echo reply, seq 1, length 64
16:12:14.322706 IP6 2001:XXX:XXX:80::172 > 2001:XXX:XXX:80::170: ICMP6, echo request, seq 2, length 64
16:12:14.322795 IP6 2001:XXX:XXX:80::170 > 2001:XXX:XXX:80::172: ICMP6, echo reply, seq 2, length 64
...

Expected behavior:
Pod ipv6 address being NATed like the ipv4 address

Actual behavior:
Only ipv4 address being NATed while ipv6 address is being left as is

Additional context / logs:
Relevant firewall ipv4 and ipv6 rules before adding the rules by hand:

[root@me-k3sv6 ~]# nft list ruleset
table ip nat {
...
        chain POSTROUTING {
                type nat hook postrouting priority srcnat; policy accept;
                 counter packets 117 bytes 11300 jump CNI-HOSTPORT-MASQ
                 counter packets 133 bytes 12536 jump KUBE-POSTROUTING
                ip saddr 10.42.0.0/16 ip daddr 10.42.0.0/16 counter packets 81 bytes 4948 return
                ip saddr 10.42.0.0/16 ip daddr != 224.0.0.0/4 counter packets 1 bytes 96 masquerade
                ip saddr != 10.42.0.0/16 ip daddr 10.42.0.0/24 counter packets 0 bytes 0 return
                ip saddr != 10.42.0.0/16 ip daddr 10.42.0.0/16 counter packets 0 bytes 0 masquerade
        }
...
}

table ip6 nat {
...
        chain POSTROUTING {
                type nat hook postrouting priority srcnat; policy accept;
                 counter packets 1 bytes 96 jump CNI-HOSTPORT-MASQ
                 counter packets 1 bytes 96 jump KUBE-POSTROUTING
        }
...
}

The text was updated successfully, but these errors were encountered:

brandond · 2021-12-08T19:58:40Z

@manuelbuil would you mind taking a look at this?

manuelbuil · 2021-12-09T15:17:17Z

Yes, this is a known behavior in flannel dual-stack. I guess the person that implemented dual-stack thought people using ipv6 would not want SNAT. Could you open a similar issue in flannel referring to this one? This feature should be implemented there

brandond · 2021-12-09T17:55:09Z

I will say that NAT with IPv6 is somewhat unusual, I think the expectation is usually that there are enough addresses available that you can have a unique addresses for everything and avoid NATing traffic entirely.

meis4h · 2021-12-10T09:19:36Z

I guess it makes sense why they would implement it like this but for the sake of consistency an option to enable SNAT for IPv6 would be nice.
Thanks for your help, I'll open an Issue in flannel to see what they think.

sjoerdsimons · 2021-12-11T10:42:55Z

While IPv6 NAT is uncommon on "normal" setups; I don't think using ULA ipv6 addresses (iotw private address) usage will be that uncommon inside k3s clusters. While every pod having a global addresses is lovely, it does require a suitable ipv6 subnet to be routed/delegated to at least the master node. Which i wouldn't necessarily expect to be that common especially for more edge users of ipv6 (and I'm not sure how the cloud providers do ipv6 subnet delegation tbh).

So making things just work seems sensible :)

SchoolGuy · 2022-01-03T18:41:38Z

Today I hit this issue too if my investigation was correct. Sadly I did not have access to an IPv6 neighbor host to fully confirm with tcpdump that this is exactly the same thing.

Setup:

Homeserver with private IPv6 address range for the pods. Server has a single globally routable IPv6 address.
Single Node k3s cluster
My ISP seems to assign me a dynamic IPv6 block which might change over time.
The node itself has a public IPv6 address but the pod not.

What does work?

DNS with IPv6 resolves correctly.
Pinging the Pod via IPv6 from the host.

What does not work?

Pinging IPv6 domains and IPs is unsuccessful.
Curl calls time out.

Advantage if this is allowed: The node itself has a public IPv6 address but the pod not. The pod also doesn't need one, as it is just is a ddclient pod to update my AAAA record via DynDNS.

ShylajaDevadiga · 2022-01-21T20:21:05Z

Reproduced the issue in k3s v1.23.1+k3s1

Steps followed as mentioned above
Install k3s in dual stack mode
Deploy testing pod
ping any target host using ipv4 and as well as ipv6
Verify that the packets arrive on the target host

ping ipv4 address from within the testing pod

# ping -c 5 192.168.24.46
PING 192.168.24.46 (192.168.24.46) 56(84) bytes of data.
64 bytes from 192.168.24.46: icmp_seq=1 ttl=63 time=1.72 ms
64 bytes from 192.168.24.46: icmp_seq=2 ttl=63 time=0.475 ms
64 bytes from 192.168.24.46: icmp_seq=3 ttl=63 time=0.529 ms
64 bytes from 192.168.24.46: icmp_seq=4 ttl=63 time=0.457 ms
64 bytes from 192.168.24.46: icmp_seq=5 ttl=63 time=0.967 ms

Verify on target host using tcpdump, pod address is NATed

20:00:33.704341 IP 192.168.28.197 > 192.168.24.46: ICMP echo request, id 42707, seq 2, length 64
20:00:33.704370 IP 192.168.24.46 > 192.168.28.197: ICMP echo reply, id 42707, seq 2, length 64

ping ipv6 address from with the testing pod

# ping6 -c 5  2600:<REDACTED>8:bd36
PING 2600:<REDACTED>bd36(2600:<REDACTED>:bd36) 56 data bytes

--- 2600:<REDACTED>:bd36 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 4093ms

Validated fix on k3s version v 1.23.2-rc1+k3s1

ping ipv6

# ping -c 5  2600:<REDACTED>:bd36
PING 2600:<REDACTED>:bd36(2600:<REDACTED>:bd36) 56 data bytes
64 bytes from 2600:<REDACTED>:bd36: icmp_seq=1 ttl=63 time=1.17 ms
...

Verify on target host using tcpdump, pod ipv6 address is NATed

20:14:42.782319 IP6 2600:<REDACTED>:78f9 > 2600:<REDACTED>:bd36: ICMP6, echo request, seq 2, length 64
20:14:42.782377 IP6 2600:<REDACTED>:bd36 > 2600:<REDACTED>:78f9: ICMP6, echo reply, seq 2, length 64

egandro · 2023-03-15T13:13:53Z

This did the trick on my setup:

# /etc/network/interfaces.d/internal-bridge (I use proxmox with a real ipv4/v6 on eon1)
iface vmbr0 inet6 static
        address fd3c:ba45:c3ea:26e5::/64
        post-up   ip6tables -t nat -A POSTROUTING -o eno1 -j MASQUERADE
        post-up   ip6tables -A FORWARD -i vmbr0 -o eno1 -j ACCEPT

# create VMs with the following qm vm  options
qm set 1001 --ipconfig0 ip=10.10.7.1/21,gw=10.10.1.1,ip6=fd3c:ba45:c3ea:26e5::1/64,gw6=fd3c:ba45:c3ea:26e5::
qm set 1002 --ipconfig0 ip=10.10.7.2/21,gw=10.10.1.1,ip6=fd3c:ba45:c3ea:26e5::2/64,gw6=fd3c:ba45:c3ea:26e5::
qm set 1003 --ipconfig0 ip=10.10.7.3/21,gw=10.10.1.1,ip6=fd3c:ba45:c3ea:26e5::3/64,gw6=fd3c:ba45:c3ea:26e5::

# k3s_ansible / in this report the --flanel-ipv6-masq was missing for the server
extra_server_args: >-
  --disable-network-policy
  --cluster-cidr=10.42.0.0/16,2001:cafe:42:0::/56
  --service-cidr=10.43.0.0/16,2001:cafe:42:1::/112
  --flannel-ipv6-masq

# test
kubectl run my-shell --rm -i --tty --image-pull-policy Always --image ubuntu -- bash
apt-get update && apt-get install -y iputils-ping
ping6 dl-cdn.alpinelinux.org
root@my-shell:/# ping6 dl-cdn.alpinelinux.org
PING dl-cdn.alpinelinux.org(2a04:4e42:8e::645 (2a04:4e42:8e::645)) 56 data bytes
64 bytes from 2a04:4e42:8e::645 (2a04:4e42:8e::645): icmp_seq=1 ttl=58 time=5.00 m

sjoerdsimons mentioned this issue Dec 11, 2021

Setup ip6 masquarading rules if for local subnets #4725

Closed

manuelbuil added this to the v1.23.2+k3s1 milestone Jan 13, 2022

manuelbuil assigned rbrtbnfgl Jan 14, 2022

This was referenced Jan 14, 2022

Add IPv6 NAT #4952

Merged

[Release 1.21] IPv6 masquerade NAT rules missing in dual-stack installation #4979

Closed

[Release 1.22] IPv6 masquerade NAT rules missing in dual-stack installation #4980

Closed

ShylajaDevadiga self-assigned this Jan 19, 2022

ShylajaDevadiga closed this as completed Jan 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IPv6 masquerade NAT rules missing in dual-stack installation #4683

IPv6 masquerade NAT rules missing in dual-stack installation #4683

meis4h commented Dec 8, 2021

brandond commented Dec 8, 2021

manuelbuil commented Dec 9, 2021

brandond commented Dec 9, 2021

meis4h commented Dec 10, 2021

sjoerdsimons commented Dec 11, 2021

SchoolGuy commented Jan 3, 2022

ShylajaDevadiga commented Jan 21, 2022

egandro commented Mar 15, 2023

IPv6 masquerade NAT rules missing in dual-stack installation #4683

IPv6 masquerade NAT rules missing in dual-stack installation #4683

Comments

meis4h commented Dec 8, 2021

brandond commented Dec 8, 2021

manuelbuil commented Dec 9, 2021

brandond commented Dec 9, 2021

meis4h commented Dec 10, 2021

sjoerdsimons commented Dec 11, 2021

SchoolGuy commented Jan 3, 2022

ShylajaDevadiga commented Jan 21, 2022

egandro commented Mar 15, 2023