Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v23.02 crashes at start on freebsd #6060

Closed
denis2342 opened this issue Mar 4, 2023 · 4 comments · Fixed by #6070
Closed

v23.02 crashes at start on freebsd #6060

denis2342 opened this issue Mar 4, 2023 · 4 comments · Fixed by #6070
Assignees
Labels
crash in diagnostic issue under diagnostic
Milestone

Comments

@denis2342
Copy link
Contributor

denis2342 commented Mar 4, 2023

I did try the new v23.02 yesterday. It crashed at start but I thought it has to do with my tunnels to the unix socket rpc. tried again and it did run flawlessly until I restarted it today (no tunnels used!).

But it crashes at the start. I disabled all plugins, still the same.

OS: freebsd 13.2
hardware: dual intel xeon, 64gb

this is the output:

 sudo -u c-lightning lightningd --conf /usr/local/etc/lightningd-bitcoin.conf
Password:
lightning_connectd: FATAL SIGNAL 6 (version v23.02)
0x34ea5a send_backtrace
	common/daemon.c:33
0x34ec89 crashdump
	common/daemon.c:46
0x82535fb5f handle_signal
	/usr/src/lib/libthr/thread/thr_sig.c:303
0x82535f11e thr_sighandler
	/usr/src/lib/libthr/thread/thr_sig.c:246
0x7ffffffff8a2 ???
	???:0
0x824ac2c5a ???
	/usr/obj/usr/src/amd64.amd64/lib/libc/thr_kill.S:4
0x824a3b6d3 __raise
	/usr/src/lib/libc/gen/raise.c:52
0x824aeca58 abort
	/usr/src/lib/libc/stdlib/abort.c:67
0x3e1700 call_error
	ccan/ccan/tal/tal.c:93
0x3e1700 check_bounds
	ccan/ccan/tal/tal.c:165
0x3e1700 to_tal_hdr
	ccan/ccan/tal/tal.c:174
0x3e1211 to_tal_hdr_or_null
	ccan/ccan/tal/tal.c:186
0x3e1211 tal_alloc_
	ccan/ccan/tal/tal.c:426
0x3db8f4 io_new_conn_
	ccan/ccan/io/io.c:91
0x3dd2e1 accept_conn
	ccan/ccan/io/poll.c:277
0x3dd2e1 io_loop
	ccan/ccan/io/poll.c:444
0x3419fa main
	connectd/connectd.c:2081
lightning_connectd: FATAL SIGNAL (version v23.02)
0x34ea5a send_backtrace
	common/daemon.c:33
0x35811b status_failed
	common/status.c:221
0x358384 status_backtrace_exit
	common/subdaemon.c:18
0x34ec8f crashdump
	common/daemon.c:49
0x82535fb5f handle_signal
	/usr/src/lib/libthr/thread/thr_sig.c:303
0x82535f11e thr_sighandler
	/usr/src/lib/libthr/thread/thr_sig.c:246
0x7ffffffff8a2 ???
	???:0
0x824ac2c5a ???
	/usr/obj/usr/src/amd64.amd64/lib/libc/thr_kill.S:4
0x824a3b6d3 __raise
	/usr/src/lib/libc/gen/raise.c:52
0x824aeca58 abort
	/usr/src/lib/libc/stdlib/abort.c:67
0x3e1700 call_error
	ccan/ccan/tal/tal.c:93
0x3e1700 check_bounds
	ccan/ccan/tal/tal.c:165
0x3e1700 to_tal_hdr
	ccan/ccan/tal/tal.c:174
0x3e1211 to_tal_hdr_or_null
	ccan/ccan/tal/tal.c:186
0x3e1211 tal_alloc_
	ccan/ccan/tal/tal.c:426
0x3db8f4 io_new_conn_
	ccan/ccan/io/io.c:91
0x3dd2e1 accept_conn
	ccan/ccan/io/poll.c:277
0x3dd2e1 io_loop
	ccan/ccan/io/poll.c:444
0x3419fa main
	connectd/connectd.c:2081
lightningd: gossipd failed (exit status 2), exiting.
Lost connection to the RPC socket.
@denis2342
Copy link
Contributor Author

I can start it in --offline mode and that works. can even do getinfo, listfunds and listpeers.

@denis2342
Copy link
Contributor Author

denis2342 commented Mar 5, 2023

oldest commit I tried now was 38d90b2, which has the same problem. Can't bisect because of DB version.

also tried commit beec517 with cherry-pick e315f30 for the db upgrade, still the same error. can't go back more.

@vincenzopalazzo vincenzopalazzo added in diagnostic issue under diagnostic crash labels Mar 5, 2023
@denis2342
Copy link
Contributor Author

after dozens of tries it started again. stopped it after a while to turn off debug logging and then it needed another 20-30 tries to start. looks like some race condition in the beginning

rustyrussell added a commit to rustyrussell/lightning that referenced this issue Mar 6, 2023
ccan/io stores the context pointer for io_new_conn, but we were using
`daemon->listeners` which we reallocate, so it can use a stale pointer.

```
0x3e1700 call_error
	ccan/ccan/tal/tal.c:93
0x3e1700 check_bounds
	ccan/ccan/tal/tal.c:165
0x3e1700 to_tal_hdr
	ccan/ccan/tal/tal.c:174
0x3e1211 to_tal_hdr_or_null
	ccan/ccan/tal/tal.c:186
0x3e1211 tal_alloc_
	ccan/ccan/tal/tal.c:426
0x3db8f4 io_new_conn_
	ccan/ccan/io/io.c:91
0x3dd2e1 accept_conn
	ccan/ccan/io/poll.c:277
0x3dd2e1 io_loop
	ccan/ccan/io/poll.c:444
0x3419fa main
	connectd/connectd.c:2081
```

Fixes: ElementsProject#6060
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@rustyrussell rustyrussell added this to the v23.02.1 milestone Mar 6, 2023
@denis2342
Copy link
Contributor Author

yeah, #6070 does the trick.

endothermicdev pushed a commit that referenced this issue Mar 6, 2023
ccan/io stores the context pointer for io_new_conn, but we were using
`daemon->listeners` which we reallocate, so it can use a stale pointer.

```
0x3e1700 call_error
	ccan/ccan/tal/tal.c:93
0x3e1700 check_bounds
	ccan/ccan/tal/tal.c:165
0x3e1700 to_tal_hdr
	ccan/ccan/tal/tal.c:174
0x3e1211 to_tal_hdr_or_null
	ccan/ccan/tal/tal.c:186
0x3e1211 tal_alloc_
	ccan/ccan/tal/tal.c:426
0x3db8f4 io_new_conn_
	ccan/ccan/io/io.c:91
0x3dd2e1 accept_conn
	ccan/ccan/io/poll.c:277
0x3dd2e1 io_loop
	ccan/ccan/io/poll.c:444
0x3419fa main
	connectd/connectd.c:2081
```

Fixes: #6060
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
vincenzopalazzo pushed a commit to vincenzopalazzo/lightning that referenced this issue Mar 23, 2023
ccan/io stores the context pointer for io_new_conn, but we were using
`daemon->listeners` which we reallocate, so it can use a stale pointer.

```
0x3e1700 call_error
	ccan/ccan/tal/tal.c:93
0x3e1700 check_bounds
	ccan/ccan/tal/tal.c:165
0x3e1700 to_tal_hdr
	ccan/ccan/tal/tal.c:174
0x3e1211 to_tal_hdr_or_null
	ccan/ccan/tal/tal.c:186
0x3e1211 tal_alloc_
	ccan/ccan/tal/tal.c:426
0x3db8f4 io_new_conn_
	ccan/ccan/io/io.c:91
0x3dd2e1 accept_conn
	ccan/ccan/io/poll.c:277
0x3dd2e1 io_loop
	ccan/ccan/io/poll.c:444
0x3419fa main
	connectd/connectd.c:2081
```

Fixes: ElementsProject#6060
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
ddustin pushed a commit to ddustin/lightning that referenced this issue Apr 11, 2023
ccan/io stores the context pointer for io_new_conn, but we were using
`daemon->listeners` which we reallocate, so it can use a stale pointer.

```
0x3e1700 call_error
	ccan/ccan/tal/tal.c:93
0x3e1700 check_bounds
	ccan/ccan/tal/tal.c:165
0x3e1700 to_tal_hdr
	ccan/ccan/tal/tal.c:174
0x3e1211 to_tal_hdr_or_null
	ccan/ccan/tal/tal.c:186
0x3e1211 tal_alloc_
	ccan/ccan/tal/tal.c:426
0x3db8f4 io_new_conn_
	ccan/ccan/io/io.c:91
0x3dd2e1 accept_conn
	ccan/ccan/io/poll.c:277
0x3dd2e1 io_loop
	ccan/ccan/io/poll.c:444
0x3419fa main
	connectd/connectd.c:2081
```

Fixes: ElementsProject#6060
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gkrizek pushed a commit to voltagecloud/lightning that referenced this issue Apr 26, 2023
ccan/io stores the context pointer for io_new_conn, but we were using
`daemon->listeners` which we reallocate, so it can use a stale pointer.

```
0x3e1700 call_error
	ccan/ccan/tal/tal.c:93
0x3e1700 check_bounds
	ccan/ccan/tal/tal.c:165
0x3e1700 to_tal_hdr
	ccan/ccan/tal/tal.c:174
0x3e1211 to_tal_hdr_or_null
	ccan/ccan/tal/tal.c:186
0x3e1211 tal_alloc_
	ccan/ccan/tal/tal.c:426
0x3db8f4 io_new_conn_
	ccan/ccan/io/io.c:91
0x3dd2e1 accept_conn
	ccan/ccan/io/poll.c:277
0x3dd2e1 io_loop
	ccan/ccan/io/poll.c:444
0x3419fa main
	connectd/connectd.c:2081
```

Fixes: ElementsProject#6060
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
ddustin pushed a commit to ddustin/lightning that referenced this issue May 12, 2023
ccan/io stores the context pointer for io_new_conn, but we were using
`daemon->listeners` which we reallocate, so it can use a stale pointer.

```
0x3e1700 call_error
	ccan/ccan/tal/tal.c:93
0x3e1700 check_bounds
	ccan/ccan/tal/tal.c:165
0x3e1700 to_tal_hdr
	ccan/ccan/tal/tal.c:174
0x3e1211 to_tal_hdr_or_null
	ccan/ccan/tal/tal.c:186
0x3e1211 tal_alloc_
	ccan/ccan/tal/tal.c:426
0x3db8f4 io_new_conn_
	ccan/ccan/io/io.c:91
0x3dd2e1 accept_conn
	ccan/ccan/io/poll.c:277
0x3dd2e1 io_loop
	ccan/ccan/io/poll.c:444
0x3419fa main
	connectd/connectd.c:2081
```

Fixes: ElementsProject#6060
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crash in diagnostic issue under diagnostic
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants
@rustyrussell @denis2342 @vincenzopalazzo and others