Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: batman-adv: directed multicast support #1357

Closed
wants to merge 3 commits into from

Conversation

T-X
Copy link
Contributor

@T-X T-X commented Apr 3, 2018

The following changes and patches add multicast-to-multi-unicast support between nodes. It supports IPv6 link-local multicast with up to 16 target nodes and if exceeded, either falls back to flooding (e.g. for important ones like ICMPv6 Neighbor Solicitations) or drops them if the gluon-ebtables-filter-multicast is installed.

This should fix our current issue of larger mesh networks having a significant amount of and overhead through ICMPv6 Neighbor Solicitations.

Recap of what we already have:

  • Multicast listener awareness: A node knows which client device is interested in which multicast transmissions (bridge IGMP/MLD snooping, IGMP/MLD querier per node, multicast-capable batman-adv TT). -> Since current Gluon master
  • Multicast-to-unicast from node to client: A node delivers multicast packets via individual unicast transmissions (only) to clients interested in it instead of broadcasting. -> Since Gluon v2016.2

This adds the missing piece to have directed, brodcast-less multicast for the whole path from one client to another.

For batman-adv:

  • It adds some patches to batman-adv to allow not just multicast-to-single-unicast but also multicast-to-multi-unicast forwarding for one thing.
  • For another, adds a patch to batman-adv to allow it to forward a frame via multicast-to-unicast but not via flooding. This is indicated to batman-adv via a "noflood" packet mark.

For gluon-ebtables-filter-multicast:

  • Relaxes the filtering rules to allow a packet which would previously be dropped to pass if it were transsmitted via up to 16 unicast transmissions (so to up to 16 nodes in the mesh).

For gluon-mesh-batman-adv:

  • Reenables the multicast_mode
  • Sets multicast_fanout limit to 16.
  • Assigns the marking number for the batman-adv noflood mark

This needs a good amount of (more) testing from other people, therefore sending it as RFC.

Once your community has fully migrated to a Gluon v2018.1 you can test this patchset: v2018.1 has the whole listener side ready. This patch only touches the multicast sender side, so you can add this patchset to dedicated nodes which have the multicast sending devices while receiving on any Gluon >=v2018.1 node.

@T-X
Copy link
Contributor Author

T-X commented Apr 3, 2018

So far I tested this patchset in the following two scenarios:

Test 1: Virtual environment

Scenario:

mcast-to-ucast-virtsetup

  • 6 nodes (6 KVM instances)
  • line topology

Tested:

  • used ping6 to multicast addresses
  • verified via tcpdump that multicast packets are transmitted correctly via individual batman-adv unicast packets
  • verified that multicast_fanout works, that is that batman-adv correctly falls back to flooding if exceeded
  • verified that the "noflood" mark works, that is that if a packet were to exceed multicast_fanout that such packets are correctly dropped instead of flooded.

Test 2: Video Streaming (non-virtual test setup)

Scenario:

mcast-to-ucast-setup

  • 3 nodes (top: TP-Link WDR4300, middle+bottom: WDR3600; meshed on 2.4GHz only as thick walls give no chance for 5GHz)
  • 6 laptops (1x sender + 1x receiver in the cellar, 4x receivers at the top)
  • Sender in the cellar transmitts three 640x320px videos simultaneously in a loop, using a dedicated IPv6 link-local multicast address for each video
  • Receivers sign up for all three multicast addresses

2018-04-02--16-06-14
Screenshot from a client device, running 3x VLC and H.O.R.S.T.

Commands:

  • Sender:
$ vlc --loop ./video1.mp4 --sout="#rtp{dst=[ff12::23%enp0s25],port=5023,mux=ts,sap,name=foobar}" --sout-keep
$ vlc --loop ./video2.mp4 --sout="#rtp{dst=[ff12::42%enp0s25],port=5042,mux=ts,sap,name=foobar}" --sout-keep
$ vlc --loop ./video3.mp4 --sout="#rtp{dst=[ff12::123%enp0s25],port=5123,mux=ts,sap,name=foobar}" --sout-keep
  • Receiver:
$ vlc "rtp://@[ff12::23%wlan0]:5023"
$ vlc "rtp://@[ff12::42%wlan0]:5042"
$ vlc "rtp://@[ff12::123%wlan0]:5123"

Observations

  • Overall, videos were played smoothly during several hours of runtime.
  • During midday and early afternoon videos would hang occasionally. "batctl ping" from the top to the bottom node only worked sometimes, batman-adv TQ values dropped severly. After about a minute, things would recover as if nothing had happened. Seemed like some wifi (chip? driver?) issues. Middle node usually had a signal strength of -70 to -80 dBm to the top and cellar one. But during that time the reported signal strength would jump wildly, even to values as good as -55 dBm which given the challenging topology did not seem to make sense at all
  • Later only sporadic frame losses
  • An "apt-get install" on a laptop 0.5m away from a multicast receving laptop using an "unrelated" wifi access point, even on a different 5GHz channel, would kill the multicasted video playback. Looks like channel 36 and 44 are still causing interference to each other. Also indicates the challenges of a TCP and UDP flow rivaling for and potentially congesting the same medium.

@T-X T-X changed the title RFC: batman-adv multicast support RFC: batman-adv: directed multicast support Apr 4, 2018
@T-X
Copy link
Contributor Author

T-X commented Apr 4, 2018

PS: In the video streaming test I also did a brief mobility/handover test. With one of the laptops at the top I moved down to the cellar and back up, briefly stopping at each node.

When moving around the lift before reaching the middle node the streams paused for a few seconds. As expected that metal beast left no chance to wpa_supplicant to perform any smooth handover. Other than that, video playback continued smoothly with only a few video frames lost. At each node I verified that the laptop and its wpa_supplicant had switched to the new AP.

++
++ ret = batadv_mcast_forw_want_all_send(bat_priv, skb, vid, &limit);
++ if (ret != NET_XMIT_SUCCESS)
++ return ret;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In conjunction with Line 423, this seems to have the effect of not delivering the packet to nodes with the want-all flag set in case no other node has the multicast address in the transtable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Urgh, you are absolutely right, line 423 should return NET_XMIT_SUCCESS instead of NET_XMIT_DROP... will fix this PR later today. Many thanks for spotting and identifying the issue, @blocktrron! And thanks to you and @mweinelt for testing this PR.

@T-X T-X force-pushed the batman-adv-mcast-to-ucast branch from aea5217 to 8aa9534 Compare April 11, 2018 15:28
@T-X
Copy link
Contributor Author

T-X commented Apr 11, 2018

Changelog v2:

  • Fixed return value in "batadv_mcast_forw_tt_send()" for an empty TT entry: the want-all lists still need to be tried. This fixes potential packetloss. - Discovered by @mweinelt and @blocktrron

@rotanid rotanid added the 0. type: enhancement The changeset is an enhancement label Apr 11, 2018
@mweinelt
Copy link
Contributor

I can confirm the issues we encountered are fixed in v2. Thanks!

mweinelt referenced this pull request May 5, 2018
0bf3b72c33d9 nat46: fixup PKG_MIRROR_HASH
23aa2e7b4afa nodogsplash2: Add NDS Restart Hook for Firewall (#369)
7ae81c8311ec cjdns: 20.1 -> 20.2
ff7b5da265e1 prince: version bump to v0.4
2f90fe406c58 miniupnpd: De-maintainering myself.
fdaa4cde3b2c bmx7: bump version
455a54207c84 batman-adv: upgrade package to latest release 2018.1
2e4937ea68f8 batctl: upgrade package to latest release 2018.1
a0eca40b0003 alfred: upgrade package to latest release 2018.1
015e5e99f2b6 bmx7: use configReaload on service reload
0ced8ec5a763 bmx7: keep bmx7 secret keys on sysupgrade
4bff0b3c65c5 cjdns: build fixes
7fc2fbdfc1b7 babeld: release 1.8.1
135bc605b4cf alfred: Support interface IDs with more than two digits
91e600e1cd9a bmx7: convert init script to use procd
86be0095b475 nodogsplash2: Add compatibility with mwan3 v2
17fccad969ea smcroute: Change download to HTTP
63cae8f571a6 bmx7: bump version
@T-X T-X force-pushed the batman-adv-mcast-to-ucast branch from 8aa9534 to 2b70d07 Compare July 10, 2018 17:47
@T-X
Copy link
Contributor Author

T-X commented Jul 10, 2018

Changelog v3:

  • Updated to Gluon master branch (which is basically Gluon v2018.1 right now)

As soon as some community returns some positive feedback for this patchset in their otherwise Gluon v2018.1 based production network network then I will remove the "RFC".

As I said before, no need to apply this patchset on all nodes yet, just some test node(s) is fine as long as (most of) the rest of your network runs v2018.1.

@T-X
Copy link
Contributor Author

T-X commented Jul 10, 2018

Some useful commands to check whether your network is and which multicast addresses are or are not suitable for tests:

  • Number of unsuportive nodes: $ batctl mf | grep " -"
  • Number of receiver nodes for a specific multicast address: $ batctl tg -m -H | sed "s/...\([^ ]*\) .*/\1/" | sort | uniq -c | sort -n -r | less

Make sure that for the multicast address you are playing with has a total, summed unsupportive node count plus receiver node count matching your multicast address < 16.

Also, with the VLC example above your multicast receivers should show up in this "number of receivers" table. Also on your multicast receiving host running Linux have a look at $ ip maddr show to find the IPv6 multicast address to MAC multicast address mapping.

@christf christf added the 2. status: blocked Marked as blocked because it's waiting on something label Nov 10, 2018
@christf
Copy link
Member

christf commented Nov 10, 2018

we need to understand if #1468 is induced by this patch. Happily removing blocked when we know this is not the case.

@mweinelt
Copy link
Contributor

It is unclear if the multicast patches are the cause of the crash in #1468, so this is not blocked but instead requires a decision on how to proceed.

We in Darmstadt have been testing this patch since its inception and had no issues with it in months.

@mweinelt mweinelt removed the 2. status: blocked Marked as blocked because it's waiting on something label Nov 10, 2018
@christf
Copy link
Member

christf commented Nov 10, 2018

The patch is meant to appeal to a wider audience in order to gain momentum for an upstream merge. I find the approach to be valid and I am interested in the feature. It has undergone a review by blocktron and FFDA is running it for months.

As discussed on IRC I see two risks:

  1. If merged everyone automatically has the patch. Maybe it would be beneficial to roll this out only when consciously selected
  2. We have had the discussion previously whether to accept patches because they must be maintained and we have usually decided for upstreaming patches if possible. This is the goal all along.

I support merging this patch given (1) is addressed. @NeoRaider @rotanid @mweinelt what are your thoughts?

@rotanid
Copy link
Member

rotanid commented Nov 11, 2018

i'm against merging a non-upstreamed patch like this.
if upstream doesn't want it even though they have been told about the FFDA test results, they will hopefully name reasons. but afaik no one submitted it yet.

@rotanid rotanid added the 2. status: blocked Marked as blocked because it's waiting on something label Nov 26, 2018
@T-X
Copy link
Contributor Author

T-X commented Jan 11, 2019

Just some background information why I haven't submitted the patches to upstream yet:

Linux network subsystem maintainers are increasingly demanding a "new" interface for configurations (sysfs -> netlink). And have recently expressed this towards batman-adv which does not have a netlink interface for configuration options yet. Therefore any patch introducing new configuration options, including this one, will likely be rejected for now.

@ecsv has provided some initial patches to introduce netlink support for configuration options in batman-adv. However, this will probably need a bit more time. Until then I can't submit this patch upstream.

I'm also absolutely in favor of an "upstream-first" policy. I guess since many communities have switched to a multi-domain setup now, there is no hurry?

Otherwise, if multicast overhead is an urgent issue for some larger communities I guess we could also interpret this pull-request as a fix. And apply it regardless of upstreaming status (and I'm definitely going to upstream). But I currently have no overview of the urgency.

@T-X T-X force-pushed the batman-adv-mcast-to-ucast branch from 2b70d07 to 1bfe69f Compare January 29, 2019 10:55
@T-X
Copy link
Contributor Author

T-X commented Jan 29, 2019

Changelog v4:

  • rebased to master (new Patch numbers)
  • fixed a double free bug in the batman-adv multicast-to-multi-unicast patch -> led to crashes, especially on netifd restart/shutdown (crash in nf_ct_del_from_dying_or_unconfirmed_list: "Kernel bug detected [...] #1468)
  • changed the default multicast_fanout from 1 to 16 in the batman-adv multicast-to-multi-unicast patch (no functional change, as it was set to 16 by the netifd scripts, too; just to sync with the proposal that was sent to the batman-adv mailing list)

@T-X T-X force-pushed the batman-adv-mcast-to-ucast branch from 1bfe69f to 30635fb Compare January 31, 2019 04:05
@T-X
Copy link
Contributor Author

T-X commented Jan 31, 2019

Changelog v5:

  • fixed skb memory leaks in error cases

T-X added 3 commits February 16, 2019 23:37
We already have multicast-to-unicast support between client and node.
And batman-adv has multicast-to-unicast conversion capabilities if there
is only one receiver. Now this patchset adds node-to-node
multicast-to-unicast support to multiple targets as well.

This should save airtime for ICMPv6 Router Solicitations, for instance.
Here we have multiple targets for the same IPv6 multicast destination
address. However, the number of destination nodes is still rather small
in this case, i.e. all gateways, making multiple unicast transmissions
less costly than flooding a packet through the whole mesh.

Also, this allows playing with multicast streaming to a limited set of
destination nodes (default setting: 16).

Also, nearly all nodes should be running a batman-adv version which
was compiled with CONFIG_BATMAN_ADV_MCAST=y (e.g. Gluon v2018.1 or
later) for this feature to take effect.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
With multicast_fanout limit set to 16, meaning a fallback to classic
broadcast flooding in case of more than 16 recipient nodes.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Allow the transmission of a multicast packet as long as it is not
flooded through the whole mesh.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
@T-X T-X force-pushed the batman-adv-mcast-to-ucast branch from 30635fb to b22c708 Compare February 16, 2019 22:37
@T-X
Copy link
Contributor Author

T-X commented Feb 16, 2019

Changelog v6:

  • rebased to master (no changes/conflicts)
  • fixed the batadv_mcast_forw_tt_send() again, which was fixed in v2 but reintroduced by v4

@ecsv
Copy link
Contributor

ecsv commented Apr 8, 2019

The fanout change will be part of the batman-adv 2019.2 release

@rotanid
Copy link
Member

rotanid commented Apr 8, 2019

@T-X one of your commit messages should read "gluon-mesh-batman-adv:" instead of "batman-adv:"

@ecsv is "the fanout change" the whole upstream part of this PR?

@ecsv
Copy link
Contributor

ecsv commented Apr 9, 2019

@rotanid No, there are also a noflood_mark which is not upstream. I am unsure about its usefulness in upstream batman-adv since it is rather tied to the gluon specific 355-mcast-drop. But we also have this ap_isolation which is rather similar.

open-mesh-syncer pushed a commit to open-mesh-mirror/batman-adv that referenced this pull request Apr 26, 2019
With DAT DHCP snooping, the gateway feature and multicast optimizations
in place in some scenarios broadcast flooding might not be strictly
necessary anymore to be able to establish IPv4/IPv6 communication.
Therefore this patch adds an option to disable broadcast flooding.

Larger mesh networks typically filter a variety of multicast packets via
ebtables/netfilter to clamp on overhead. With this option such firewall
rules can be relaxed so that such multicast packets are only dropped
if they cannot be handled by multicast-to-unicast, for instance.

"noflood" comes in two flavours: If set to 1 then flood prevention is
enabled for all multicast/broadcast packets except ICMPv6 and IGMP
(cautious mode). Or, if set to 2 then flood prevention is enabled for
all multicast/broadcast packets (aggressive mode). If set to 0 then
flood prevention is disabled.

"noflood" is disabled by default as there are still some things to take
care of to avoid breaking things (especially for the "aggressive mode").

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>

---
The initial approach was a "noflood"-mark which worked similar to the
isolation mark. Which allowed more flexibility so that the user could
mark specific packets to be excluded from broadcast flooding via
ebtables/netfilter. However, in practice skb-marking is not that easy to
configure and if misconfigured will break a network horribly. Which was
also a downside noted and discussed at BattleMesh v11.

This version now is a less flexible but therefore also simpler to use
variant.

[0]: https://git.open-mesh.org/batman-adv.git/shortlog/refs/heads/linus/noflood-mark
[1]: freifunk-gluon/gluon#1357
@mweinelt mweinelt added 2. status: merge conflict The merge has a conflict and needs rebasing and removed 2. status: blocked Marked as blocked because it's waiting on something labels Nov 4, 2019
@@ -1 +1,6 @@
if os.execute('batctl -v | grep -q "batman-adv: 2013.4"') ~= 0 then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batman-adv-legacy was dropped after v2019.1 was branched.

@mweinelt
Copy link
Contributor

mweinelt commented May 9, 2021

Superseded by #2206 and #2209.

@mweinelt mweinelt closed this May 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0. type: enhancement The changeset is an enhancement 2. status: merge conflict The merge has a conflict and needs rebasing 3. topic: batman-adv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants