Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v23.05 gossipd related crash #6270

Closed
grubles opened this issue May 22, 2023 · 8 comments
Closed

v23.05 gossipd related crash #6270

grubles opened this issue May 22, 2023 · 8 comments
Labels
in diagnostic issue under diagnostic
Milestone

Comments

@grubles
Copy link
Contributor

grubles commented May 22, 2023

OS: Rocky Linux 9 ppc64le
CLN v23.05

2023-05-21T12:36:55.480Z **BROKEN** gossipd: backtrace: common/daemon.c:38 (send_backtrace) 0x1001cd67
2023-05-21T12:36:55.481Z **BROKEN** gossipd: backtrace: common/status.c:221 (status_failed) 0x10025533
2023-05-21T12:36:55.481Z **BROKEN** gossipd: backtrace: gossipd/gossip_store.c:607 (delete_by_index) 0x1000a637
2023-05-21T12:36:55.481Z **BROKEN** gossipd: backtrace: gossipd/gossip_store.c:631 (gossip_store_delete) 0x1000b697
2023-05-21T12:36:55.481Z **BROKEN** gossipd: backtrace: gossipd/routing.c:2240 (routing_expire_channels) 0x100161bf
2023-05-21T12:36:55.481Z **BROKEN** gossipd: backtrace: gossipd/gossipd.c:877 (new_blockheight) 0x100074ef
2023-05-21T12:36:55.481Z **BROKEN** gossipd: backtrace: gossipd/gossipd.c:1086 (recv_req) 0x10008e1f
2023-05-21T12:36:55.481Z **BROKEN** gossipd: backtrace: common/daemon_conn.c:35 (handle_read) 0x1001d06f
2023-05-21T12:36:55.481Z **BROKEN** gossipd: backtrace: ccan/ccan/io/io.c:59 (next_plan) 0x100e8863
2023-05-21T12:36:55.481Z **BROKEN** gossipd: backtrace: ccan/ccan/io/io.c:407 (do_plan) 0x100e9147
2023-05-21T12:36:55.481Z **BROKEN** gossipd: backtrace: ccan/ccan/io/io.c:417 (io_ready) 0x100e924f
2023-05-21T12:36:55.481Z **BROKEN** gossipd: backtrace: ccan/ccan/io/poll.c:453 (io_loop) 0x100ebd7b
2023-05-21T12:36:55.481Z **BROKEN** gossipd: backtrace: gossipd/gossipd.c:1203 (main) 0x100099c7
2023-05-21T12:36:55.482Z **BROKEN** gossipd: backtrace: (null):0 ((null)) 0x7fff8bf13c23
2023-05-21T12:36:55.482Z **BROKEN** gossipd: backtrace: (null):0 ((null)) 0x7fff8bf13e07
2023-05-21T12:36:55.482Z **BROKEN** gossipd: backtrace: (null):0 ((null)) 0xffffffffffffffff
2023-05-21T12:36:55.482Z **BROKEN** gossipd: STATUS_FAIL_INTERNAL_ERROR: Failed reading flags & len to delete @58913714: No such file or directory
[user@localhost ~]$ lightning_gossipd: Failed reading flags & len to delete @58913714: No such file or directory (version v23.05)                     
0x1001cd07 send_backtrace     
        common/daemon.c:33
0x10025533 status_failed
        common/status.c:221
0x1000a637 delete_by_index
        gossipd/gossip_store.c:607
0x1000b697 gossip_store_delete
        gossipd/gossip_store.c:631                                                                                                                            
0x100161bf routing_expire_channels
        gossipd/routing.c:2240
0x100074ef new_blockheight
        gossipd/gossipd.c:877
0x10008e1f recv_req
        gossipd/gossipd.c:1086
0x1001d06f handle_read
        common/daemon_conn.c:35
0x100e8863 next_plan
        ccan/ccan/io/io.c:59
0x100e9147 do_plan
        ccan/ccan/io/io.c:407
0x100e924f io_ready
        ccan/ccan/io/io.c:417
0x100ebd7b io_loop
        ccan/ccan/io/poll.c:453
0x100099c7 main
        gossipd/gossipd.c:1203
0x7fff8bf13c23 ???
        ???:0
0x7fff8bf13e07 ???
        ???:0
0xffffffffffffffff ???
        ???:0
Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the```
@vincenzopalazzo vincenzopalazzo added this to the v23.08 milestone May 22, 2023
@vincenzopalazzo vincenzopalazzo added the in diagnostic issue under diagnostic label May 22, 2023
@rustyrussell
Copy link
Contributor

This is a weird one. How long was your node up before this happened?

@grubles
Copy link
Contributor Author

grubles commented May 23, 2023

Host uptime is 6 days so probably a bit less than that.

endothermicdev added a commit to endothermicdev/lightning that referenced this issue May 25, 2023
Reported in ElementsProject#6270, there was an attempt to delete gossip overrunning
the end of the gossip_store. This logs the gossip type that was attempted to be deleted and avoids an immediate crash (tombstones would be fine to
skip over at least.)

Changelog-None
@vincenzopalazzo
Copy link
Contributor

@grubles, is it possible to have the gossip file in some way? I would like to take a look at it

endothermicdev added a commit to endothermicdev/lightning that referenced this issue Jun 1, 2023
Reported in ElementsProject#6270, there was an attempt to delete gossip overrunning
the end of the gossip_store. This logs the gossip type that was attempted to be deleted and avoids an immediate crash (tombstones would be fine to
skip over at least.)

Changelog-None
@grubles
Copy link
Contributor Author

grubles commented Jun 1, 2023

@grubles, is it possible to have the gossip file in some way? I would like to take a look at it

Max file size for github is 25 MB and the gossip store is 41 MB. Know of a place I can upload to?

@vincenzopalazzo
Copy link
Contributor

Maybe on google drive or something similar?

@grubles
Copy link
Contributor Author

grubles commented Jun 1, 2023

rustyrussell pushed a commit that referenced this issue Jun 2, 2023
Reported in #6270, there was an attempt to delete gossip overrunning
the end of the gossip_store. This logs the gossip type that was attempted to be deleted and avoids an immediate crash (tombstones would be fine to
skip over at least.)

Changelog-None
rustyrussell pushed a commit to rustyrussell/lightning that referenced this issue Jun 5, 2023
Reported in ElementsProject#6270, there was an attempt to delete gossip overrunning
the end of the gossip_store. This logs the gossip type that was attempted to be deleted and avoids an immediate crash (tombstones would be fine to
skip over at least.)

Changelog-None
@rustyrussell
Copy link
Contributor

Thanks! Ok, this looks literally impossible:

... The record is indeed at 58913714. The pread call should not fail. Are you on some weird filesystem?

0006469e6170100002800000000000003e8000000000000000100000000938580c0
58913564: t=1684661782 channel_update: 0102e30665c76bdeb3e46d20a93378d357e3be83627d3c70a4f4884247000fa8f6393e182b802d4e26b4bec82b804fc06cb51b02f106d348ad2044c0edbffeacd5266fe28c0ab6f1b372c1a6a246ae63f74f931e8365e15a089c68d61900000000000b968100053700016469e6160100009000000000000003e800000000000000320000000076046700
58913714: dying channel: 772293x317x1 (deadline 790737)
58913740: t=1684661551 channel_update: 010204199bc6b4c5591dc6b37a776b3a340cee717a4370b3f524857862d46f6b6698601af727d39ac2a7bbfb459e545da16b97b37edb888d76417ffdad0f7d206d406fe28c0ab6f1b372c1a6a246ae63f74f931e8365e15a089c68d61900000000000c0e74000ac100026469e52f010100280000000000000001000000000000000e00000000938580c0

@rustyrussell
Copy link
Contributor

We no longer crash on this, though we haven't root-caused it. So I'm closing, given we worked around it in master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in diagnostic issue under diagnostic
Projects
None yet
Development

No branches or pull requests

3 participants