-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update ccan, fix latent bug in gossmap code that reveals #5375
Update ccan, fix latent bug in gossmap code that reveals #5375
Conversation
@@ -176,25 +176,25 @@ u32 gossmap_chan_idx(const struct gossmap *map, const struct gossmap_chan *chan) | |||
return chan - map->chan_arr; | |||
} | |||
|
|||
/* htable can't handle NULL values, so we add 1 */ | |||
/* htable can't handle NULL or 1 values, so we add 2 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I maybe have missed your explanation in this bug fixing for ccan hash map, I think it is related to the memory pointer? that the hash map use like key?
Just curious on why the table can handle these values :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It uses these values to mean "empty" and "deleted". Now, usually it's OK, because it also puts hash bits in unused bits (bits which are the same for every value in the map), and there are usually lots of these: values are usually all heap pointers. But it's possible that the hash bits are all 0, in which case these would look like empty or deleted buckets.
It's always been possible, but the updated htable added an assertion to check for (void *)1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, putting 0x2 in the hash table before ccan/htable was updated caused another bug (which is what the htable code just fixed!). So these commits have to be together!
38d70b8
to
3efc924
Compare
3efc924
to
019a716
Compare
…ssertions. Updating ccan to stricter htable revealed we were trying to put (void *)1 in the htable, which is forbidden: ``` topology: ccan/ccan/htable/htable.c:382: htable_add_: Assertion `entry_is_valid((uintptr_t)p)' failed. topology: FATAL SIGNAL 6 (version 1358d7f) 0x55f30c689c34 send_backtrace common/daemon.c:33 0x55f30c689ce0 crashdump common/daemon.c:46 0x7f5d150fe51f ??? ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0 0x7f5d15152828 __pthread_kill_implementation ./nptl/pthread_kill.c:44 0x7f5d15152828 __pthread_kill_internal ./nptl/pthread_kill.c:80 0x7f5d15152828 __GI___pthread_kill ./nptl/pthread_kill.c:91 0x7f5d150fe475 __GI_raise ../sysdeps/posix/raise.c:26 0x7f5d150e47b6 __GI_abort ./stdlib/abort.c:79 0x7f5d150e46da __assert_fail_base ./assert/assert.c:92 0x7f5d150f5e25 __GI___assert_fail ./assert/assert.c:101 0x55f30c6adbe4 htable_add_ ccan/ccan/htable/htable.c:382 0x55f30c65f303 chanidx_htable_add common/gossmap.c:35 0x55f30c6605ed new_channel common/gossmap.c:337 0x55f30c6609cf add_channel common/gossmap.c:425 0x55f30c661101 map_catchup common/gossmap.c:607 0x55f30c66221e gossmap_refresh common/gossmap.c:927 0x55f30c66e3e9 get_gossmap plugins/topology.c:27 0x55f30c66f939 listpeers_done plugins/topology.c:369 0x55f30c671f46 handle_rpc_reply plugins/libplugin.c:558 0x55f30c672a19 rpc_read_response_one plugins/libplugin.c:726 0x55f30c672b4f rpc_conn_read_response plugins/libplugin.c:746 0x55f30c6ae35e next_plan ccan/ccan/io/io.c:59 0x55f30c6aef93 do_plan ccan/ccan/io/io.c:407 0x55f30c6aefd5 io_ready ccan/ccan/io/io.c:417 0x55f30c6b1371 io_loop ccan/ccan/io/poll.c:453 0x55f30c67587c plugin_main plugins/libplugin.c:1559 0x55f30c6708eb main plugins/topology.c:701 0x7f5d150e5fcf __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 0x7f5d150e607c __libc_start_main_impl ../csu/libc-start.c:409 0x55f30c65d894 ??? ???:0 0xffffffffffffffff ??? ???:0 ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Caused a crash in CI, reproduced under valgrind by calling any_channel_by_scid from io_poll_lightningd: ``` ==2422524== Conditional jump or move depends on uninitialised value(s) ==2422524== at 0x12C98D: any_channel_by_scid (channel.c:606) ==2422524== by 0x14FF75: io_poll_lightningd (lightningd.c:682) ==2422524== by 0x225FDE: io_loop (poll.c:420) ==2422524== by 0x14A914: io_loop_with_timers (io_loop_with_timers.c:22) ==2422524== by 0x150C4E: main (lightningd.c:1193) ==2422524== Uninitialised value was created by a heap allocation ==2422524== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==2422524== by 0x234F61: allocate (tal.c:250) ==2422524== by 0x235522: tal_alloc_ (tal.c:428) ==2422524== by 0x12B500: new_unsaved_channel (channel.c:203) ==2422524== by 0x13B77A: json_openchannel_init (dual_open_control.c:2610) ==2422524== by 0x14C78D: command_exec (jsonrpc.c:630) ==2422524== by 0x14CD9F: rpc_command_hook_final (jsonrpc.c:765) ==2422524== by 0x181DDA: plugin_hook_call_ (plugin_hook.c:278) ==2422524== by 0x14D198: plugin_hook_call_rpc_command (jsonrpc.c:853) ==2422524== by 0x14D6A0: parse_request (jsonrpc.c:957) ==2422524== by 0x14DAFE: read_json (jsonrpc.c:1054) ==2422524== by 0x2231C8: next_plan (io.c:59) ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
019a716
to
388742d
Compare
ACK 388742d confirmed that test fails are all flakes, |
The gossip_store version was bumbed to 10 in PR ElementsProject#5239 and ElementsProject#5375 has apparently not been rebased on top of that when it was merged, causing the `run-gossmap_canned` test to fail when parsing the v9 gossip store. Changelog-None
The gossip_store version was bumbed to 10 in PR ElementsProject#5239 and ElementsProject#5375 has apparently not been rebased on top of that when it was merged, causing the `run-gossmap_canned` test to fail when parsing the v9 gossip store. Changelog-None
The gossip_store version was bumbed to 10 in PR ElementsProject#5239 and ElementsProject#5375 has apparently not been rebased on top of that when it was merged, causing the `run-gossmap_canned` test to fail when parsing the v9 gossip store. Changelog-None
The gossip_store version was bumbed to 10 in PR ElementsProject#5239 and ElementsProject#5375 has apparently not been rebased on top of that when it was merged, causing the `run-gossmap_canned` test to fail when parsing the v9 gossip store. Changelog-None
The gossip_store version was bumbed to 10 in PR ElementsProject#5239 and ElementsProject#5375 has apparently not been rebased on top of that when it was merged, causing the `run-gossmap_canned` test to fail when parsing the v9 gossip store. Changelog-None
The gossip_store version was bumbed to 10 in PR ElementsProject#5239 and ElementsProject#5375 has apparently not been rebased on top of that when it was merged, causing the `run-gossmap_canned` test to fail when parsing the v9 gossip store. Changelog-None
The gossip_store version was bumbed to 10 in PR ElementsProject#5239 and ElementsProject#5375 has apparently not been rebased on top of that when it was merged, causing the `run-gossmap_canned` test to fail when parsing the v9 gossip store. Changelog-None
Updating ccan to stricter htable (as in current ccan!) revealed we were trying to put
(void *)1 in the htable, which is forbidden:
Changelog-None