-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crash in nf_ct_del_from_dying_or_unconfirmed_list: "Kernel bug detected [...] #1468
Comments
|
Please also check current v2018.1.x (with update to 4.4.139) and Gluon master. |
I can currently repeatedly reproduce this when I try to upgrade my 1043NDv2. We are running de19cd5 with these patchsets: #1329, #1357, #1616.
|
@T-X is the author of the multicast patch, maybe he has an idea |
Hm, this crash is triggered through a failing assertion in netfliter-conntrack here: https://elixir.bootlin.com/linux/v4.9.146/source/net/netfilter/nf_conntrack_core.c#L354:
Going through the "git log net/netfilter/nf_conntrack_core.c" 368982cd ("netfilter: nfnetlink_queue: resolve clash for unconfirmed conntracks") got my attention. It says:
I'm wondering whether that might have something to do with it. The batman-adv multicast-to-multi-unicast patch works quite similar to how broadcast packets are duplicated and transmitted on an interface three times. One notable difference to the broadcast flooding is though, that each transmission is delayed a bit for broadcast packets. Maybe from a conntrack perspective these duplicate packets get hashed to the same value and since we are queueing them quickly one after the other maybe they might run into this bug / race condition in conntrack? The mentioned patch was added with Linux 4.18 and was not backported to stable kernels. |
On the other hand, I'm wondering why conntrack would want to look at batman-adv frames at all. I'm wondering whether we are missing to reset the protocol type somewhere in batman-adv or something like this. At least an "skb->protocol = htons(ETH_P_BATMAN);" is there in batadv_send_skb_packet(). Or maybe this bug might not directly have something to do with batman-adv. |
The sysupgrade worked just fine after I disabled the batman-adv multicast mode ( |
Killing netifd with a node connected to a mesh of reasonable size also triggered this issue reliably. Problem is found, was a bug in the batman-adv multicast-to-multi-unicast patch and is fixed in #1357 with v4. |
Is there still a need to keep this issue open, or can it be tracked as part of #1357? |
It's fixed and noted in the changelog v5 of #1357. So yes, can be closed here. |
After some minutes a Ubiquiti Bullet M2, running Gluon master (basically v2018.1) with the multicast related patches in #1357 and a site config from Freifunk Hannover crashes for me in the conntrack code:
Firmware binaries to reproduce can be found here: https://metameute.de/~tux/Freifunk/firmware/mcast-to-ucast3/
The text was updated successfully, but these errors were encountered: