Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPv6 masquerade NAT rules missing in dual-stack installation #4683

Closed
meis4h opened this issue Dec 8, 2021 · 8 comments
Closed

IPv6 masquerade NAT rules missing in dual-stack installation #4683

meis4h opened this issue Dec 8, 2021 · 8 comments
Assignees
Milestone

Comments

@meis4h
Copy link

meis4h commented Dec 8, 2021

Environmental Info:
K3s Version:

k3s version v1.22.4+k3s1 (bec170bc)
go version go1.16.10

Node(s) CPU architecture, OS, and Version:
Linux me-k3sv6 4.18.0-348.2.1.el8_5.x86_64 #1 SMP Mon Nov 15 20:49:28 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
RockyLinux 8.5 with a single network adapter with an ipv4 and ipv6 address assigned
I have verified the same behavior on Debian 11 too

Cluster Configuration:
Single node installation without any additional agents.

Describe the bug:
When creating a cluster using the dual-stack options described in the docs, any outgoing pod traffic is exiting the node with the ipv6 address of the pod instead of the address of the nodes interface.
Looks like the NAT rules in nftables which is responsible for translating the pod addresses are only being created for ipv4 addresses and not for ipv6.
Replicating the existing ipv4 rules to ipv6 by hand fixes the issue.

Im not sure if this is a bug or by design but I would expect outgoing traffic to work the same for ipv4 and ipv6 by default.

Steps To Reproduce:

  • Install K3s with dual-stack options:
curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=v1.22 sh -s - \
--disable-network-policy \
--node-ip="172.30.80.172,2001:XXX:XXX:80::172" \
--cluster-cidr="10.42.0.0/16,fc15:1::/56" \
--service-cidr="10.43.0.0/16,fc15:2::/112"
  • Create pod for testing and ping6 neighboring host which is known to be reachable via ipv6 from the host directly:
[root@me-k3sv6 ~]# kubectl run --rm -it --image=centos bash
If you don't see a command prompt, try pressing enter.
[root@bash /]# ping6 -c 10 2001:XXX:XXX:80::170
PING 2001:XXX:XXX:80::170(2001:XXX:XXX:80::170) 56 data bytes

--- 2001:XXX:XXX:80::170 ping statistics ---
10 packets transmitted, 0 received, 100% packet loss, time 9230ms
  • Verify that the packets arrive on the target host with the pod address using tcpdump
[root@target-host ~]# tcpdump -n -i ens192 ip6
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens192, link-type EN10MB (Ethernet), capture size 262144 bytes
16:03:24.789473 IP6 fc15:1::b > 2001:XXX:XXX:80::170: ICMP6, echo request, seq 1, length 64
16:03:24.789542 IP6 2001:XXX:XXX:80::170 > fc15:1::b: ICMP6, echo reply, seq 1, length 64
16:03:25.810603 IP6 fc15:1::b > 2001:XXX:XXX:80::170: ICMP6, echo request, seq 2, length 64
16:03:25.810663 IP6 2001:XXX:XXX:80::170 > fc15:1::b: ICMP6, echo reply, seq 2, length 64
...
  • Apply missing NAT rules:
nft add rule ip6 nat POSTROUTING ip6 saddr fc15:1::/56 ip6 daddr fc15:1::/56 counter packets 81 bytes 4948 return
nft add rule ip6 nat POSTROUTING ip6 saddr fc15:1::/56 ip6 daddr != ff00::/8 counter packets 1 bytes 96 masquerade
nft add rule ip6 nat POSTROUTING ip6 saddr != fc15:1::/56 ip6 daddr fc15:1::/64 counter packets 0 bytes 0 return
nft add rule ip6 nat POSTROUTING ip6 saddr != fc15:1::/56 ip6 daddr fc15:1::/56 counter packets 0 bytes 0 masquerade
  • Verify ipv6 connection is working now:
[root@bash /]# ping6 -c 3 2001:XXX:XXX:80::170
PING 2001:XXX:XXX:80::170(2001:XXX:XXX:80::170) 56 data bytes
64 bytes from 2001:XXX:XXX:80::170: icmp_seq=1 ttl=63 time=0.234 ms
64 bytes from 2001:XXX:XXX:80::170: icmp_seq=2 ttl=63 time=0.289 ms
64 bytes from 2001:XXX:XXX:80::170: icmp_seq=3 ttl=63 time=0.360 ms

--- 2001:XXX:XXX:80::170 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2082ms
rtt min/avg/max/mdev = 0.234/0.294/0.360/0.053 ms
  • Verify address again on the target host:
[root@target-host ~]# tcpdump -n -i ens192 ip6
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens192, link-type EN10MB (Ethernet), capture size 262144 bytes
16:12:13.305330 IP6 2001:XXX:XXX:80::172 > 2001:XXX:XXX:80::170: ICMP6, echo request, seq 1, length 64
16:12:13.305429 IP6 2001:XXX:XXX:80::170 > 2001:XXX:XXX:80::172: ICMP6, echo reply, seq 1, length 64
16:12:14.322706 IP6 2001:XXX:XXX:80::172 > 2001:XXX:XXX:80::170: ICMP6, echo request, seq 2, length 64
16:12:14.322795 IP6 2001:XXX:XXX:80::170 > 2001:XXX:XXX:80::172: ICMP6, echo reply, seq 2, length 64
...

Expected behavior:
Pod ipv6 address being NATed like the ipv4 address

Actual behavior:
Only ipv4 address being NATed while ipv6 address is being left as is

Additional context / logs:
Relevant firewall ipv4 and ipv6 rules before adding the rules by hand:

[root@me-k3sv6 ~]# nft list ruleset
table ip nat {
...
        chain POSTROUTING {
                type nat hook postrouting priority srcnat; policy accept;
                 counter packets 117 bytes 11300 jump CNI-HOSTPORT-MASQ
                 counter packets 133 bytes 12536 jump KUBE-POSTROUTING
                ip saddr 10.42.0.0/16 ip daddr 10.42.0.0/16 counter packets 81 bytes 4948 return
                ip saddr 10.42.0.0/16 ip daddr != 224.0.0.0/4 counter packets 1 bytes 96 masquerade
                ip saddr != 10.42.0.0/16 ip daddr 10.42.0.0/24 counter packets 0 bytes 0 return
                ip saddr != 10.42.0.0/16 ip daddr 10.42.0.0/16 counter packets 0 bytes 0 masquerade
        }
...
}

table ip6 nat {
...
        chain POSTROUTING {
                type nat hook postrouting priority srcnat; policy accept;
                 counter packets 1 bytes 96 jump CNI-HOSTPORT-MASQ
                 counter packets 1 bytes 96 jump KUBE-POSTROUTING
        }
...
}
@brandond
Copy link
Member

brandond commented Dec 8, 2021

@manuelbuil would you mind taking a look at this?

@manuelbuil
Copy link
Contributor

Yes, this is a known behavior in flannel dual-stack. I guess the person that implemented dual-stack thought people using ipv6 would not want SNAT. Could you open a similar issue in flannel referring to this one? This feature should be implemented there

@brandond
Copy link
Member

brandond commented Dec 9, 2021

I will say that NAT with IPv6 is somewhat unusual, I think the expectation is usually that there are enough addresses available that you can have a unique addresses for everything and avoid NATing traffic entirely.

@meis4h
Copy link
Author

meis4h commented Dec 10, 2021

I guess it makes sense why they would implement it like this but for the sake of consistency an option to enable SNAT for IPv6 would be nice.
Thanks for your help, I'll open an Issue in flannel to see what they think.

@sjoerdsimons
Copy link
Contributor

While IPv6 NAT is uncommon on "normal" setups; I don't think using ULA ipv6 addresses (iotw private address) usage will be that uncommon inside k3s clusters. While every pod having a global addresses is lovely, it does require a suitable ipv6 subnet to be routed/delegated to at least the master node. Which i wouldn't necessarily expect to be that common especially for more edge users of ipv6 (and I'm not sure how the cloud providers do ipv6 subnet delegation tbh).

So making things just work seems sensible :)

@SchoolGuy
Copy link

Today I hit this issue too if my investigation was correct. Sadly I did not have access to an IPv6 neighbor host to fully confirm with tcpdump that this is exactly the same thing.

Setup:

  • Homeserver with private IPv6 address range for the pods. Server has a single globally routable IPv6 address.
  • Single Node k3s cluster
  • My ISP seems to assign me a dynamic IPv6 block which might change over time.
  • The node itself has a public IPv6 address but the pod not.

What does work?

  • DNS with IPv6 resolves correctly.
  • Pinging the Pod via IPv6 from the host.

What does not work?

  • Pinging IPv6 domains and IPs is unsuccessful.
  • Curl calls time out.

Advantage if this is allowed: The node itself has a public IPv6 address but the pod not. The pod also doesn't need one, as it is just is a ddclient pod to update my AAAA record via DynDNS.

@ShylajaDevadiga
Copy link
Contributor

Reproduced the issue in k3s v1.23.1+k3s1

  • Steps followed as mentioned above
  • Install k3s in dual stack mode
  • Deploy testing pod
  • ping any target host using ipv4 and as well as ipv6
  • Verify that the packets arrive on the target host

ping ipv4 address from within the testing pod

# ping -c 5 192.168.24.46
PING 192.168.24.46 (192.168.24.46) 56(84) bytes of data.
64 bytes from 192.168.24.46: icmp_seq=1 ttl=63 time=1.72 ms
64 bytes from 192.168.24.46: icmp_seq=2 ttl=63 time=0.475 ms
64 bytes from 192.168.24.46: icmp_seq=3 ttl=63 time=0.529 ms
64 bytes from 192.168.24.46: icmp_seq=4 ttl=63 time=0.457 ms
64 bytes from 192.168.24.46: icmp_seq=5 ttl=63 time=0.967 ms

Verify on target host using tcpdump, pod address is NATed

20:00:33.704341 IP 192.168.28.197 > 192.168.24.46: ICMP echo request, id 42707, seq 2, length 64
20:00:33.704370 IP 192.168.24.46 > 192.168.28.197: ICMP echo reply, id 42707, seq 2, length 64

ping ipv6 address from with the testing pod

# ping6 -c 5  2600:<REDACTED>8:bd36
PING 2600:<REDACTED>bd36(2600:<REDACTED>:bd36) 56 data bytes

--- 2600:<REDACTED>:bd36 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 4093ms

Validated fix on k3s version v 1.23.2-rc1+k3s1

ping ipv6

# ping -c 5  2600:<REDACTED>:bd36
PING 2600:<REDACTED>:bd36(2600:<REDACTED>:bd36) 56 data bytes
64 bytes from 2600:<REDACTED>:bd36: icmp_seq=1 ttl=63 time=1.17 ms
...

Verify on target host using tcpdump, pod ipv6 address is NATed

20:14:42.782319 IP6 2600:<REDACTED>:78f9 > 2600:<REDACTED>:bd36: ICMP6, echo request, seq 2, length 64
20:14:42.782377 IP6 2600:<REDACTED>:bd36 > 2600:<REDACTED>:78f9: ICMP6, echo reply, seq 2, length 64

@egandro
Copy link

egandro commented Mar 15, 2023

This did the trick on my setup:

# /etc/network/interfaces.d/internal-bridge (I use proxmox with a real ipv4/v6 on eon1)
iface vmbr0 inet6 static
        address fd3c:ba45:c3ea:26e5::/64
        post-up   ip6tables -t nat -A POSTROUTING -o eno1 -j MASQUERADE
        post-up   ip6tables -A FORWARD -i vmbr0 -o eno1 -j ACCEPT
# create VMs with the following qm vm  options
qm set 1001 --ipconfig0 ip=10.10.7.1/21,gw=10.10.1.1,ip6=fd3c:ba45:c3ea:26e5::1/64,gw6=fd3c:ba45:c3ea:26e5::
qm set 1002 --ipconfig0 ip=10.10.7.2/21,gw=10.10.1.1,ip6=fd3c:ba45:c3ea:26e5::2/64,gw6=fd3c:ba45:c3ea:26e5::
qm set 1003 --ipconfig0 ip=10.10.7.3/21,gw=10.10.1.1,ip6=fd3c:ba45:c3ea:26e5::3/64,gw6=fd3c:ba45:c3ea:26e5::
# k3s_ansible / in this report the --flanel-ipv6-masq was missing for the server
extra_server_args: >-
  --disable-network-policy
  --cluster-cidr=10.42.0.0/16,2001:cafe:42:0::/56
  --service-cidr=10.43.0.0/16,2001:cafe:42:1::/112
  --flannel-ipv6-masq
# test
kubectl run my-shell --rm -i --tty --image-pull-policy Always --image ubuntu -- bash
apt-get update && apt-get install -y iputils-ping
ping6 dl-cdn.alpinelinux.org
root@my-shell:/# ping6 dl-cdn.alpinelinux.org
PING dl-cdn.alpinelinux.org(2a04:4e42:8e::645 (2a04:4e42:8e::645)) 56 data bytes
64 bytes from 2a04:4e42:8e::645 (2a04:4e42:8e::645): icmp_seq=1 ttl=58 time=5.00 m

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants