Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LB; improve PMTUd support for external clients #455

Merged
merged 2 commits into from
Oct 3, 2023
Merged

Conversation

zolug
Copy link
Collaborator

@zolug zolug commented Sep 13, 2023

Description

In LB-FE make sure to return ICMP reply to external party from the VIP src the offending packet was sent to.

This is achieved by using nftables rules doing stateless SNAT on ICMP replies generated by the LB-FE.
Only replies triggered by incoming packets with VIP dst are of interest. (In which case MTU related ICMP reply can be triggered during forwarding of the original packet.)

Solution requires sysctl fwmark_reflect enabled for both ipv4 and ipv6:

In case using fwmark_reflect sysctls is not desirable:

If it could be granted that "plain routes" even pointing to wrong network (e.g. default routes for both IPv4 and IPv6) would exist for all possible remote addresses for which an ICMP Frag Needed/Packet Too Big should be returned, then nft rules could be replaced to check the encapsulated original packets addresses instead of relying on fwmark value. (Orig dst must be a VIP, but not the src.)

(In case of IPv6 adding a src route to match packets from IPv6 address of the external interface and point them e.g. towards the BIRD maintained routing table also works.)

Test:

  • Lower Meridio internal MTU by setting the NSM_MTU env variable in Proxy (e.g. to 1280)
  • Setup a Kind cluster with Meridio and external-host. Running 1 LB-FE is sufficient
  • Run tcpdump in both LB-FE and external-host capturing ICMP Frag Needed and ICMPv6 Packet Too Big
  • On external-host use ping to send out big packets (use "-M do" in case of IPv4 to force IP DF bit):
    ping 20.0.0.1 -s 1380 -M do
    ping 2000::1 -s 1380 -M do
    Additionally, it's advised to configure additional new IPv4 and IPv6 "TG" addresses on the external-host's vlan0 interface, and use those to send out pings from (to not rely on existing device routes in the LB-FE):
    ping -I 5.5.5.5 20.0.0.1 -s 1380 -M do
    ping -I 5000::1 2000::1 -s 1380 -M do
  • The ICMP replies must be sent out by the LB-FE using the VIP address as source the offending packet was destined to.
    Try without the PR as well, in which case depending on the src address of the original packet the ICMP reply will be either sent out to the primary network, or the external network using an address configured on the external interface.

Issue link

#452

Checklist

  • Purpose
    • Bug fix
    • New functionality
    • Documentation
    • Refactoring
    • CI
  • Test
    • Unit test
    • E2E Test
    • Tested manually
  • Introduce a breaking change
    • Yes (description required)
    • No

Make sure to return ICMP Frag Needed/Packet Too Big replies using
the VIP address as src to which the offending packet was sent.

Solution requires sysctl fwmark_reflect enabled for both ipv4 and ipv6.
@zolug
Copy link
Collaborator Author

zolug commented Sep 20, 2023

/reverify

@zolug zolug requested a review from LionelJouin September 20, 2023 16:10
New operator environment variable named CONDUIT_MTU allows
for setting MTU value with which Conduits are going to be deployed
by default.
@zolug zolug merged commit 9ba41a2 into master Oct 3, 2023
@zolug zolug deleted the lb-fe-ext-pmtud branch November 6, 2023 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants