Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mosquitto_sub hits deadlock when using the -W (timeout) options #3175

Open
audricschiltknecht opened this issue Nov 29, 2024 · 1 comment
Open
Labels
Status: Available No one has claimed responsibility for resolving this issue.

Comments

@audricschiltknecht
Copy link

audricschiltknecht commented Nov 29, 2024

The mosquitto_sub client can end in a deadlock in certain scenarios when the timeout option (`-W) is set.
The client utilizes alarm() to set a signal handler that triggers once the timeout expires. However, this can result in a deadlock condition in the following case:

  • Client is displaying a log: a lock on &mosq->log_callback_mutex is taken in log__printf
  • Signal occurs, handler is called.
  • Client will call the mosquitto_disconnect_v5 function.
  • Function will at some point issue a log when calling send__disconnect
  • log__printf will try to acquire the lock on the same mutex from step 1: deadlock.

This is somewhat difficult to reproduce, but I've found the issue will investigating what we think is #2887 and a variation of #2887 (comment) seems to be the best bet to end up in the same case.

So:

  • Send lots of retained topics to the broker: for i in {0..20000}; do echo $i; mosquitto_pub -t "retained/$i" -m $i -r -q 0 ; done
  • Then try to remove all retained messages and enable debug logging: mosquitto_sub --retained-only --remove-retained -W 1 -F %t -t "retained/+" -d

Environment and version

  • Tested on mosquitto version: 2.0.18/2.0.20
  • OS: Ubuntu 24.04

Backtrace from GDB:

(gdb) bt
#0  futex_wait (private=0, expected=2, futex_word=0x58816d092760) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait (futex=futex@entry=0x58816d092760, private=0) at ./nptl/lowlevellock.c:49
#2  0x00007a62076a00f1 in lll_mutex_lock_optimized (mutex=0x58816d092760) at ./nptl/pthread_mutex_lock.c:48
#3  ___pthread_mutex_lock (mutex=0x58816d092760) at ./nptl/pthread_mutex_lock.c:93
#4  0x00007a6207857aac in log__printf (mosq=0x58816d0925f0, priority=16, fmt=0x7a620786987f "Client %s sending DISCONNECT") at /home/audric/Projets/mosquitto/lib/logging_mosq.c:40
#5  0x00007a6207862789 in send__disconnect (mosq=0x58816d0925f0, reason_code=4 '\004', properties=0x0) at /home/audric/Projets/mosquitto/lib/send_disconnect.c:54
#6  0x00007a62078552a4 in mosquitto_disconnect_v5 (mosq=0x58816d0925f0, reason_code=4, properties=0x0) at /home/audric/Projets/mosquitto/lib/connect.c:263
#7  0x000058816b69cf17 in my_signal_handler (signum=14) at /home/audric/Projets/mosquitto/client/sub_client.c:55
#8  <signal handler called>
#9  0x00007a620771c574 in __GI___libc_write (fd=1, buf=0x58816d094130, nbytes=86) at ../sysdeps/unix/sysv/linux/write.c:26
#10 0x00007a6207693965 in _IO_new_file_write (f=0x7a62078045c0 <_IO_2_1_stdout_>, data=0x58816d094130, n=86) at ./libio/fileops.c:1181
#11 0x00007a6207692561 in new_do_write (to_do=86, data=0x58816d094130 "Client null received PUBLISH (d0, q0, r1, m0, 'test-retained/18334', ... (500 bytes))\n", fp=0x7a62078045c0 <_IO_2_1_stdout_>) at ./libio/libioP.h:1030
#12 _IO_new_do_write (fp=fp@entry=0x7a62078045c0 <_IO_2_1_stdout_>, data=0x58816d094130 "Client null received PUBLISH (d0, q0, r1, m0, 'test-retained/18334', ... (500 bytes))\n", to_do=86) at ./libio/fileops.c:426
#13 0x00007a6207692ee3 in _IO_new_file_overflow (f=0x7a62078045c0 <_IO_2_1_stdout_>, ch=10) at ./libio/fileops.c:784
#14 0x00007a6207687dca in __GI__IO_puts (str=0x58816d094590 "Client null received PUBLISH (d0, q0, r1, m0, 'test-retained/18334', ... (500 bytes))") at ./libio/ioputs.c:41
#15 0x000058816b69d459 in my_log_callback (mosq=0x58816d0925f0, obj=0x58816b6ad060 <cfg>, level=16, str=0x58816d094590 "Client null received PUBLISH (d0, q0, r1, m0, 'test-retained/18334', ... (500 bytes))") at /home/audric/Projets/mosquitto/client/sub_client.c:176
#16 0x00007a6207857bbd in log__printf (mosq=0x58816d0925f0, priority=16, fmt=0x7a6207868220 "Client %s received PUBLISH (d%d, q%d, r%d, m%d, '%s', ... (%ld bytes))") at /home/audric/Projets/mosquitto/lib/logging_mosq.c:54
#17 0x00007a6207856462 in handle__publish (mosq=0x58816d0925f0) at /home/audric/Projets/mosquitto/lib/handle_publish.c:116
#18 0x00007a620786200d in handle__packet (mosq=0x58816d0925f0) at /home/audric/Projets/mosquitto/lib/read_handle.c:51
#19 0x00007a620785f88c in packet__read (mosq=0x58816d0925f0) at /home/audric/Projets/mosquitto/lib/packet_mosq.c:566
#20 0x00007a6207858935 in mosquitto_loop_read (mosq=0x58816d0925f0, max_packets=4096) at /home/audric/Projets/mosquitto/lib/loop.c:380
#21 0x00007a6207858140 in mosquitto_loop (mosq=0x58816d0925f0, timeout=-1, max_packets=1) at /home/audric/Projets/mosquitto/lib/loop.c:149
#22 0x00007a620785855d in mosquitto_loop_forever (mosq=0x58816d0925f0, timeout=-1, max_packets=1) at /home/audric/Projets/mosquitto/lib/loop.c:257
#23 0x000058816b69de5d in main (argc=10, argv=0x7ffc9a6c2078) at /home/audric/Projets/mosquitto/client/sub_client.c:403
#24 0x00007a620762a1ca in __libc_start_call_main (main=main@entry=0x58816b69db17 <main>, argc=argc@entry=10, argv=argv@entry=0x7ffc9a6c2078) at ../sysdeps/nptl/libc_start_call_main.h:58
#25 0x00007a620762a28b in __libc_start_main_impl (main=0x58816b69db17 <main>, argc=10, argv=0x7ffc9a6c2078, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffc9a6c2068) at ../csu/libc-start.c:360
#26 0x000058816b69ce05 in _start ()
@github-actions github-actions bot added the Status: Available No one has claimed responsibility for resolving this issue. label Nov 29, 2024
@audricschiltknecht
Copy link
Author

Another case this time occurring in the packet__queue function:

On a Linux target, using mosquitto 2.0.18:

#0  0x0000ffffad63d02c in ?? () from /lib/libc.so.6
#1  0x0000ffffad6436e0 in pthread_mutex_lock () from /lib/libc.so.6
#2  0x0000ffffad780650 in packet__write (mosq=0xaaaac7c1f550) at /tmp/mosquitto/lib/packet_mosq.c:235
#3  packet__write (mosq=0xaaaac7c1f550) at /tmp/mosquitto/lib/packet_mosq.c:226
#4  0x0000ffffad7809e8 in packet__queue (mosq=mosq@entry=0xaaaac7c1f550, packet=0xaaaac7c23430) at /tmp/mosquitto/lib/packet_mosq.c:203
#5  0x0000ffffad783214 in send__disconnect (mosq=mosq@entry=0xaaaac7c1f550, reason_code=reason_code@entry=4 '\004', properties=properties@entry=0x0)
    at /tmp/mosquitto/lib/send_disconnect.c:82
#6  0x0000ffffad77972c in mosquitto_disconnect_v5 (properties=<optimized out>, reason_code=<optimized out>, mosq=<optimized out>) at /tmp/mosquitto/lib/connect.c:263
#7  mosquitto_disconnect_v5 (mosq=0xaaaac7c1f550, reason_code=reason_code@entry=4, properties=<optimized out>) at /tmp/mosquitto/lib/connect.c:236
#8  0x0000aaaabbba37c0 in my_signal_handler (signum=14) at /tmp/mosquitto/client/sub_client.c:57
#9  <signal handler called>
#10 0x0000ffffad6a9e94 in send () from /lib/libc.so.6
#11 0x0000ffffad77eb98 in net__write (mosq=<optimized out>, buf=<optimized out>, count=<optimized out>) at /tmp/mosquitto/lib/net_mosq.c:1020
#12 0x0000ffffad7806d8 in packet__write (mosq=0xaaaac7c1f550) at /tmp/mosquitto/lib/packet_mosq.c:263
#13 packet__write (mosq=0xaaaac7c1f550) at /tmp/mosquitto/lib/packet_mosq.c:226
#14 0x0000ffffad7809e8 in packet__queue (mosq=mosq@entry=0xaaaac7c1f550, packet=packet@entry=0xaaaac7c23700) at /tmp/mosquitto/lib/packet_mosq.c:203
#15 0x0000ffffad783998 in send__real_publish (mosq=mosq@entry=0xaaaac7c1f550, mid=1720, topic=0xaaaac7c59010 "test-retained/10520", payloadlen=0, payload=0x0, qos=1 '\001', 
    retain=<optimized out>, dup=<optimized out>, cmsg_props=0x0, store_props=0x0, expiry_interval=0) at /tmp/mosquitto/lib/send_publish.c:220
#16 0x0000ffffad783c28 in send__publish (mosq=mosq@entry=0xaaaac7c1f550, mid=<optimized out>, topic=<optimized out>, payloadlen=<optimized out>, payload=<optimized out>, 
    qos=<optimized out>, retain=<optimized out>, dup=<optimized out>, cmsg_props=<optimized out>, store_props=<optimized out>, store_props@entry=0x0, expiry_interval=<optimized out>, 
    expiry_interval@entry=0) at /tmp/mosquitto/lib/send_publish.c:133
#17 0x0000ffffad77c708 in message__release_to_inflight (dir=mosq_md_out, mosq=0xaaaac7c1f550) at /tmp/mosquitto/lib/messages_mosq.c:207
#18 message__release_to_inflight (mosq=mosq@entry=0xaaaac7c1f550, dir=dir@entry=mosq_md_out) at /tmp/mosquitto/lib/messages_mosq.c:192
#19 0x0000ffffad77a1b0 in handle__pubackcomp (mosq=0xaaaac7c1f550, type=<optimized out>) at /tmp/mosquitto/lib/handle_pubackcomp.c:157
#20 0x0000ffffad780c10 in packet__read (mosq=0xaaaac7c1f550) at /tmp/mosquitto/lib/packet_mosq.c:566
#21 0x0000ffffad77b7c0 in mosquitto_loop_read (mosq=mosq@entry=0xaaaac7c1f550, max_packets=9540, max_packets@entry=1) at /tmp/mosquitto/lib/loop.c:380
#22 0x0000ffffad77bb7c in mosquitto_loop (mosq=mosq@entry=0xaaaac7c1f550, timeout=timeout@entry=-1, max_packets=max_packets@entry=1) at /tmp/mosquitto/lib/loop.c:149
#23 0x0000ffffad77bd24 in mosquitto_loop_forever (mosq=0xaaaac7c1f550, timeout=timeout@entry=-1, max_packets=max_packets@entry=1) at /tmp/mosquitto/lib/loop.c:257
#24 0x0000aaaabbba2e3c in main (argc=<optimized out>, argv=<optimized out>) at /tmp/mosquitto/client/sub_client.c:405

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Available No one has claimed responsibility for resolving this issue.
Projects
None yet
Development

No branches or pull requests

1 participant