V0.11.1, connectd crash #5284

devastgh · 2022-05-22T18:58:46Z

Issue and Steps to Reproduce

Might be similar to #5282

Flow of events:
Compacting backup:

2022-05-17T23:07:50.314Z INFO    plugin-backup.py: Starting compaction: stats={'before': {'backupsize': 8571037716, 'version_count': 939851}}
2022-05-17T23:26:26.894Z INFO    plugin-backup.py: Adding intial snapshot with 650534912 bytes for version 2345945
2022-05-17T23:26:27.602Z INFO    plugin-backup.py: Compacted 939849 changes, saving 7920501846 bytes, swapping backups

One force close, that probably shouldn't have happened:

2022-05-17T23:26:31.547Z UNUSUAL xxxxxxxxxxxx-chan#1985: Peer permanent failure in CHANNELD_NORMAL: Offered HTLC 3361 SENT_ADD_ACK_REVOCATION cltv 736843 hit deadline
2022-05-17T23:26:31.549Z INFO    xxxxxxxxxxxxx-chan#1985: State changed from CHANNELD_NORMAL to AWAITING_UNILATERAL

Crash:

2022-05-17T23:26:31.567Z **BROKEN** connectd: FATAL SIGNAL 6 (version 0.11.1)
2022-05-17T23:26:31.567Z **BROKEN** connectd: backtrace: common/daemon.c:38 (send_backtrace) 0x557c9a97e96c

Backtrace from syslog:


May 18 01:26:29 xxxxxxxx lightningd[2077]: lightning_connectd: ccan/ccan/tal/tal.c:393: del_tree: Assertion `!taken(from_tal_hdr(t))' failed.
May 18 01:26:29 xxxxxxxx lightningd[2077]: lightning_connectd: FATAL SIGNAL 6 (version 0.11.1)
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a97e924 send_backtrace
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011common/daemon.c:33
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a97e9be crashdump
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011common/daemon.c:46
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x7f42b8dd908f ???
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011???:0
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x7f42b8dd900b ???
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011???:0
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x7f42b8db8858 ???
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011???:0
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x7f42b8db8728 ???
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011???:0
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x7f42b8dc9fd5 ???
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011???:0
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9bd668 del_tree
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011ccan/ccan/tal/tal.c:393
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9bd684 del_tree
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011ccan/ccan/tal/tal.c:412
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9bdbc1 tal_free
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011ccan/ccan/tal/tal.c:486
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9742cc peer_reconnected
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011connectd/connectd.c:259
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9745cb peer_connected
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011connectd/connectd.c:351
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a97493c retry_peer_connected
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011connectd/connectd.c:228
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9b242e next_plan
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011ccan/ccan/io/io.c:59
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9b288a io_do_always
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011ccan/ccan/io/io.c:435
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9b3d9b handle_always
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011ccan/ccan/io/poll.c:304
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9b40ff io_loop
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011ccan/ccan/io/poll.c:385
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a974c7e main
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011connectd/connectd.c:2158
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x7f42b8dba082 ???
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011???:0
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a96d78d ???
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011???:0
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0xffffffffffffffff ???
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011???:0
May 18 01:26:29 xxxxxxxx lightningd[2077]: lightning_connectd: FATAL SIGNAL (version 0.11.1)
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a97e924 send_backtrace
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011common/daemon.c:33
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a98759a status_failed
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011common/status.c:221
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9876ac status_backtrace_exit
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011common/subdaemon.c:18
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a97e9c4 crashdump
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011common/daemon.c:49
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x7f42b8dd908f ???
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011???:0
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x7f42b8dd900b ???
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011???:0
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x7f42b8db8858 ???
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011???:0
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x7f42b8db8728 ???
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011???:0
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x7f42b8dc9fd5 ???
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011???:0
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9bd668 del_tree
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011ccan/ccan/tal/tal.c:393
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9bd684 del_tree
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011ccan/ccan/tal/tal.c:412
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9bdbc1 tal_free
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011ccan/ccan/tal/tal.c:486
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9742cc peer_reconnected
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011connectd/connectd.c:259
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9745cb peer_connected
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011connectd/connectd.c:351
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a97493c retry_peer_connected
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011connectd/connectd.c:228
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9b242e next_plan
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011ccan/ccan/io/io.c:59
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9b288a io_do_always
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011ccan/ccan/io/io.c:435
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9b3d9b handle_always
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011ccan/ccan/io/poll.c:304
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9b40ff io_loop
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011ccan/ccan/io/poll.c:385
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a974c7e main
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011connectd/connectd.c:2158
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x7f42b8dba082 ???
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011???:0
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a96d78d ???
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011???:0
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0xffffffffffffffff ???
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011???:0

`getinfo` output

{
"id": "REDACTED",
"alias": "REDACTED",
"color": "037e27",
"num_peers": XX,
"num_pending_channels": 0,
"num_active_channels": XX,
"num_inactive_channels": X,
"address": [
{
"type": "ipv6",
"address": "XXXX",
"port": 9735
},
{
"type": "torv3",
"address": "XXXXX",
"port": 9735
}
],
"binding": [
{
"type": "ipv6",
"address": "::",
"port": 9735
},
{
"type": "ipv4",
"address": "0.0.0.0",
"port": 9735
}
],
"version": "0.11.1",
"blockheight": 737466,
"network": "bitcoin",
"msatoshi_fees_collected": X,
"fees_collected_msat": "XXXX",
"lightning-dir": "/home/lightning/.lightning/bitcoin",
"our_features": {
"init": "282a69a2",
"node": "800000282a69a2",
"channel": "",
"invoice": "02000020024100"
}
}

The text was updated successfully, but these errors were encountered:

`peer_reconnected` was freeing a `struct peer_reconnected` instance while a pointer to that instance was registered to be passed as an argument to the `retry_peer_connected` callback function. This caused a use-after-free crash when `retry_peer_connected` attempted to reparent the instance to the temporary context. Instead, never have `peer_reconnected` free a `struct peer_reconnected` instance, and only ever allow such an instance to be freed after the `retry_peer_connected` callback has finished with it. To ensure that the instance is freed even if the connection is closed before the callback can be invoked, parent the instance to the connection rather than to the daemon. Absent the need to free `struct peer_reconnected` instances outside of the `retry_peer_connected` callback, there is no use for the `reconnected` hashtable, so remove it as well. See: ElementsProject#5282 (comment) Fixes: ElementsProject#5282 Fixes: ElementsProject#5284 Changelog-Fixed: connectd no longer crashes when peers reconnect.

whitslack · 2022-06-27T06:23:28Z

This is fixed for me in v0.11.2.

cdecker · 2022-06-27T10:56:02Z

Duplicates #5282

`peer_reconnected` was freeing a `struct peer_reconnected` instance while a pointer to that instance was registered to be passed as an argument to the `retry_peer_connected` callback function. This caused a use-after-free crash when `retry_peer_connected` attempted to reparent the instance to the temporary context. Instead, never have `peer_reconnected` free a `struct peer_reconnected` instance, and only ever allow such an instance to be freed after the `retry_peer_connected` callback has finished with it. To ensure that the instance is freed even if the connection is closed before the callback can be invoked, parent the instance to the connection rather than to the daemon. Absent the need to free `struct peer_reconnected` instances outside of the `retry_peer_connected` callback, there is no use for the `reconnected` hashtable, so remove it as well. See: #5282 (comment) Fixes: #5282 Fixes: #5284 Changelog-Fixed: connectd no longer crashes when peers reconnect.

devastgh changed the title ~~V0.11.1, coonectd crash~~ V0.11.1, connectd crash May 22, 2022

btweenthebars mentioned this issue May 25, 2022

v0.11.1, crash in connectd #5282

Closed

whitslack mentioned this issue Jun 1, 2022

Node dies/restarts randomly - probably due to gossipd failing. #5299

Closed

whitslack mentioned this issue Jun 1, 2022

connectd: avoid use-after-free upon multiple reconnections by a peer #5300

Merged

cdecker closed this as completed Jun 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V0.11.1, connectd crash #5284

V0.11.1, connectd crash #5284

devastgh commented May 22, 2022

whitslack commented Jun 27, 2022

cdecker commented Jun 27, 2022

V0.11.1, connectd crash #5284

V0.11.1, connectd crash #5284

Comments

devastgh commented May 22, 2022

Issue and Steps to Reproduce

getinfo output

whitslack commented Jun 27, 2022

cdecker commented Jun 27, 2022

`getinfo` output