LB; improve PMTUd support for external clients #455
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
In LB-FE make sure to return ICMP reply to external party from the VIP src the offending packet was sent to.
This is achieved by using nftables rules doing stateless SNAT on ICMP replies generated by the LB-FE.
Only replies triggered by incoming packets with VIP dst are of interest. (In which case MTU related ICMP reply can be triggered during forwarding of the original packet.)
Solution requires sysctl fwmark_reflect enabled for both ipv4 and ipv6:
https://elixir.bootlin.com/linux/v5.10/source/net/ipv6/icmp.c#L598
https://elixir.bootlin.com/linux/v5.10.194/source/net/ipv4/icmp.c#L744
In case using fwmark_reflect sysctls is not desirable:
If it could be granted that "plain routes" even pointing to wrong network (e.g. default routes for both IPv4 and IPv6) would exist for all possible remote addresses for which an ICMP Frag Needed/Packet Too Big should be returned, then nft rules could be replaced to check the encapsulated original packets addresses instead of relying on fwmark value. (Orig dst must be a VIP, but not the src.)
(In case of IPv6 adding a src route to match packets from IPv6 address of the external interface and point them e.g. towards the BIRD maintained routing table also works.)
Test:
ping 20.0.0.1 -s 1380 -M do
ping 2000::1 -s 1380 -M do
Additionally, it's advised to configure additional new IPv4 and IPv6 "TG" addresses on the external-host's vlan0 interface, and use those to send out pings from (to not rely on existing device routes in the LB-FE):
ping -I 5.5.5.5 20.0.0.1 -s 1380 -M do
ping -I 5000::1 2000::1 -s 1380 -M do
Try without the PR as well, in which case depending on the src address of the original packet the ICMP reply will be either sent out to the primary network, or the external network using an address configured on the external interface.
Issue link
#452
Checklist