Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V0.11.1, connectd crash #5284

Closed
devastgh opened this issue May 22, 2022 · 2 comments · Fixed by #5300
Closed

V0.11.1, connectd crash #5284

devastgh opened this issue May 22, 2022 · 2 comments · Fixed by #5300

Comments

@devastgh
Copy link

Issue and Steps to Reproduce

Might be similar to #5282

Flow of events:
Compacting backup:

2022-05-17T23:07:50.314Z INFO    plugin-backup.py: Starting compaction: stats={'before': {'backupsize': 8571037716, 'version_count': 939851}}
2022-05-17T23:26:26.894Z INFO    plugin-backup.py: Adding intial snapshot with 650534912 bytes for version 2345945
2022-05-17T23:26:27.602Z INFO    plugin-backup.py: Compacted 939849 changes, saving 7920501846 bytes, swapping backups

One force close, that probably shouldn't have happened:

2022-05-17T23:26:31.547Z UNUSUAL xxxxxxxxxxxx-chan#1985: Peer permanent failure in CHANNELD_NORMAL: Offered HTLC 3361 SENT_ADD_ACK_REVOCATION cltv 736843 hit deadline
2022-05-17T23:26:31.549Z INFO    xxxxxxxxxxxxx-chan#1985: State changed from CHANNELD_NORMAL to AWAITING_UNILATERAL

Crash:

2022-05-17T23:26:31.567Z **BROKEN** connectd: FATAL SIGNAL 6 (version 0.11.1)
2022-05-17T23:26:31.567Z **BROKEN** connectd: backtrace: common/daemon.c:38 (send_backtrace) 0x557c9a97e96c

Backtrace from syslog:


May 18 01:26:29 xxxxxxxx lightningd[2077]: lightning_connectd: ccan/ccan/tal/tal.c:393: del_tree: Assertion `!taken(from_tal_hdr(t))' failed.
May 18 01:26:29 xxxxxxxx lightningd[2077]: lightning_connectd: FATAL SIGNAL 6 (version 0.11.1)
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a97e924 send_backtrace
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011common/daemon.c:33
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a97e9be crashdump
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011common/daemon.c:46
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x7f42b8dd908f ???
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011???:0
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x7f42b8dd900b ???
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011???:0
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x7f42b8db8858 ???
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011???:0
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x7f42b8db8728 ???
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011???:0
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x7f42b8dc9fd5 ???
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011???:0
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9bd668 del_tree
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011ccan/ccan/tal/tal.c:393
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9bd684 del_tree
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011ccan/ccan/tal/tal.c:412
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9bdbc1 tal_free
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011ccan/ccan/tal/tal.c:486
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9742cc peer_reconnected
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011connectd/connectd.c:259
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9745cb peer_connected
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011connectd/connectd.c:351
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a97493c retry_peer_connected
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011connectd/connectd.c:228
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9b242e next_plan
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011ccan/ccan/io/io.c:59
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9b288a io_do_always
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011ccan/ccan/io/io.c:435
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9b3d9b handle_always
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011ccan/ccan/io/poll.c:304
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9b40ff io_loop
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011ccan/ccan/io/poll.c:385
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a974c7e main
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011connectd/connectd.c:2158
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x7f42b8dba082 ???
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011???:0
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a96d78d ???
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011???:0
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0xffffffffffffffff ???
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011???:0
May 18 01:26:29 xxxxxxxx lightningd[2077]: lightning_connectd: FATAL SIGNAL (version 0.11.1)
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a97e924 send_backtrace
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011common/daemon.c:33
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a98759a status_failed
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011common/status.c:221
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9876ac status_backtrace_exit
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011common/subdaemon.c:18
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a97e9c4 crashdump
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011common/daemon.c:49
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x7f42b8dd908f ???
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011???:0
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x7f42b8dd900b ???
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011???:0
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x7f42b8db8858 ???
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011???:0
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x7f42b8db8728 ???
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011???:0
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x7f42b8dc9fd5 ???
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011???:0
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9bd668 del_tree
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011ccan/ccan/tal/tal.c:393
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9bd684 del_tree
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011ccan/ccan/tal/tal.c:412
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9bdbc1 tal_free
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011ccan/ccan/tal/tal.c:486
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9742cc peer_reconnected
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011connectd/connectd.c:259
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9745cb peer_connected
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011connectd/connectd.c:351
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a97493c retry_peer_connected
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011connectd/connectd.c:228
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9b242e next_plan
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011ccan/ccan/io/io.c:59
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9b288a io_do_always
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011ccan/ccan/io/io.c:435
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9b3d9b handle_always
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011ccan/ccan/io/poll.c:304
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a9b40ff io_loop
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011ccan/ccan/io/poll.c:385
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a974c7e main
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011connectd/connectd.c:2158
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x7f42b8dba082 ???
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011???:0
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0x557c9a96d78d ???
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011???:0
May 18 01:26:29 xxxxxxxx lightningd[2077]: 0xffffffffffffffff ???
May 18 01:26:29 xxxxxxxx lightningd[2077]: #011???:0

getinfo output

{
"id": "REDACTED",
"alias": "REDACTED",
"color": "037e27",
"num_peers": XX,
"num_pending_channels": 0,
"num_active_channels": XX,
"num_inactive_channels": X,
"address": [
{
"type": "ipv6",
"address": "XXXX",
"port": 9735
},
{
"type": "torv3",
"address": "XXXXX",
"port": 9735
}
],
"binding": [
{
"type": "ipv6",
"address": "::",
"port": 9735
},
{
"type": "ipv4",
"address": "0.0.0.0",
"port": 9735
}
],
"version": "0.11.1",
"blockheight": 737466,
"network": "bitcoin",
"msatoshi_fees_collected": X,
"fees_collected_msat": "XXXX",
"lightning-dir": "/home/lightning/.lightning/bitcoin",
"our_features": {
"init": "282a69a2",
"node": "800000282a69a2",
"channel": "",
"invoice": "02000020024100"
}
}

@devastgh devastgh changed the title V0.11.1, coonectd crash V0.11.1, connectd crash May 22, 2022
whitslack added a commit to whitslack/lightning that referenced this issue May 30, 2022
`peer_reconnected` was freeing a `struct peer_reconnected` instance
while a pointer to that instance was registered to be passed as an
argument to the `retry_peer_connected` callback function. This caused a
use-after-free crash when `retry_peer_connected` attempted to reparent
the instance to the temporary context.

Instead, never have `peer_reconnected` free a `struct peer_reconnected`
instance, and only ever allow such an instance to be freed after the
`retry_peer_connected` callback has finished with it. To ensure that the
instance is freed even if the connection is closed before the callback
can be invoked, parent the instance to the connection rather than to the
daemon.

Absent the need to free `struct peer_reconnected` instances outside of
the `retry_peer_connected` callback, there is no use for the
`reconnected` hashtable, so remove it as well.

See: ElementsProject#5282 (comment)
Fixes: ElementsProject#5282
Fixes: ElementsProject#5284
Changelog-Fixed: connectd no longer crashes when peers reconnect.
whitslack added a commit to whitslack/lightning that referenced this issue Jun 1, 2022
`peer_reconnected` was freeing a `struct peer_reconnected` instance
while a pointer to that instance was registered to be passed as an
argument to the `retry_peer_connected` callback function. This caused a
use-after-free crash when `retry_peer_connected` attempted to reparent
the instance to the temporary context.

Instead, never have `peer_reconnected` free a `struct peer_reconnected`
instance, and only ever allow such an instance to be freed after the
`retry_peer_connected` callback has finished with it. To ensure that the
instance is freed even if the connection is closed before the callback
can be invoked, parent the instance to the connection rather than to the
daemon.

Absent the need to free `struct peer_reconnected` instances outside of
the `retry_peer_connected` callback, there is no use for the
`reconnected` hashtable, so remove it as well.

See: ElementsProject#5282 (comment)
Fixes: ElementsProject#5282
Fixes: ElementsProject#5284
Changelog-Fixed: connectd no longer crashes when peers reconnect.
rustyrussell pushed a commit to rustyrussell/lightning that referenced this issue Jun 20, 2022
`peer_reconnected` was freeing a `struct peer_reconnected` instance
while a pointer to that instance was registered to be passed as an
argument to the `retry_peer_connected` callback function. This caused a
use-after-free crash when `retry_peer_connected` attempted to reparent
the instance to the temporary context.

Instead, never have `peer_reconnected` free a `struct peer_reconnected`
instance, and only ever allow such an instance to be freed after the
`retry_peer_connected` callback has finished with it. To ensure that the
instance is freed even if the connection is closed before the callback
can be invoked, parent the instance to the connection rather than to the
daemon.

Absent the need to free `struct peer_reconnected` instances outside of
the `retry_peer_connected` callback, there is no use for the
`reconnected` hashtable, so remove it as well.

See: ElementsProject#5282 (comment)
Fixes: ElementsProject#5282
Fixes: ElementsProject#5284
Changelog-Fixed: connectd no longer crashes when peers reconnect.
@whitslack
Copy link
Collaborator

This is fixed for me in v0.11.2.

@cdecker
Copy link
Member

cdecker commented Jun 27, 2022

Duplicates #5282

@cdecker cdecker closed this as completed Jun 27, 2022
rustyrussell pushed a commit that referenced this issue Jun 28, 2022
`peer_reconnected` was freeing a `struct peer_reconnected` instance
while a pointer to that instance was registered to be passed as an
argument to the `retry_peer_connected` callback function. This caused a
use-after-free crash when `retry_peer_connected` attempted to reparent
the instance to the temporary context.

Instead, never have `peer_reconnected` free a `struct peer_reconnected`
instance, and only ever allow such an instance to be freed after the
`retry_peer_connected` callback has finished with it. To ensure that the
instance is freed even if the connection is closed before the callback
can be invoked, parent the instance to the connection rather than to the
daemon.

Absent the need to free `struct peer_reconnected` instances outside of
the `retry_peer_connected` callback, there is no use for the
`reconnected` hashtable, so remove it as well.

See: #5282 (comment)
Fixes: #5282
Fixes: #5284
Changelog-Fixed: connectd no longer crashes when peers reconnect.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants