Due to the way we use exceptions to emulate TLS and RAS for armv4/v5,… #13

cognet · 2015-10-30T21:12:08Z

… we gotta

make cpu_signal_handler() exclusive. The trick is, cpu_signal_handler() never
returns, it just siglongjmps() to the main loop, so try to work around this.u
exclusive.

… we gotta make cpu_signal_handler() exclusive. The trick is, cpu_signal_handler() never returns, it just siglongjmps() to the main loop, so try to work around this.u exclusive.

Due to the way we use exceptions to emulate TLS and RAS for armv4/v5,…

A new gdb commands are added: qemu handlers That dumps an AioContext list (by default qemu_aio_context) possibly including a backtrace for cases it knows about (with the verbose option). Intended to help find why something is hanging waiting for IO. Use 'qemu handlers --verbose iohandler_ctx' to find out why your incoming migration is stuck. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-id: 1445951385-11924-1-git-send-email-dgilbert@redhat.com V2: Merge into one command with optional handlers arg, and only do backtrace in verbose mode (gdb) qemu handlers ---- {pfd = {fd = 6, events = 25, revents = 0}, io_read = 0x55869656ffd0 <event_notifier_dummy_cb>, io_write = 0x0, deleted = 0, opaque = 0x558698c4ce08, node = {le_next = 0x0, le_prev = 0x558698c4cdc0}} (gdb) qemu handlers iohandler_ctx ---- {pfd = {fd = 9, events = 25, revents = 0}, io_read = 0x558696581380 <fd_coroutine_enter>, io_write = 0x0, deleted = 0, opaque = 0x558698dc99d0, node = {le_next = 0x558698c4cca0, le_prev = 0x558698c4c1d0}} ---- {pfd = {fd = 4, events = 25, revents = 0}, io_read = 0x55869657b330 <sigfd_handler>, io_write = 0x0, deleted = 0, opaque = 0x4, node = {le_next = 0x558698c4c260, le_prev = 0x558699f72508}} ---- {pfd = {fd = 5, events = 25, revents = 0}, io_read = 0x55869656ffd0 <event_notifier_dummy_cb>, io_write = 0x0, deleted = 0, opaque = 0x558698c4c218, node = {le_next = 0x0, le_prev = 0x558698c4ccc8}} ---- (gdb) qemu handlers --verbose iohandler_ctx ---- {pfd = {fd = 9, events = 25, revents = 0}, io_read = 0x558696581380 <fd_coroutine_enter>, io_write = 0x0, deleted = 0, opaque = 0x558698dc99d0, node = {le_next = 0x558698c4cca0, le_prev = 0x558698c4c1d0}} #0 0x0000558696581820 in qemu_coroutine_switch (from_=from_@entry=0x558698cb3cf0, to_=to_@entry=0x7f421c37eac8, action=action@entry=COROUTINE_YIELD) at /home/dgilbert/git/qemu/coroutine-ucontext.c:177 #1 0x0000558696580c00 in qemu_coroutine_yield () at /home/dgilbert/git/qemu/qemu-coroutine.c:145 #2 0x00005586965814f5 in yield_until_fd_readable (fd=9) at /home/dgilbert/git/qemu/qemu-coroutine-io.c:90 #3 0x0000558696523937 in socket_get_buffer (opaque=0x55869a3dc620, buf=0x558698c505a0 "", pos=<optimized out>, size=32768) at /home/dgilbert/git/qemu/migration/qemu-file-unix.c:101 #4 0x0000558696521fac in qemu_fill_buffer (f=0x558698c50570) at /home/dgilbert/git/qemu/migration/qemu-file.c:227 #5 0x0000558696522989 in qemu_peek_byte (f=0x558698c50570, offset=0) at /home/dgilbert/git/qemu/migration/qemu-file.c:507 #6 0x0000558696522bf4 in qemu_get_be32 (f=0x558698c50570) at /home/dgilbert/git/qemu/migration/qemu-file.c:520 #7 0x0000558696522bf4 in qemu_get_be32 (f=f@entry=0x558698c50570) at /home/dgilbert/git/qemu/migration/qemu-file.c:604 #8 0x0000558696347e5c in qemu_loadvm_state (f=f@entry=0x558698c50570) at /home/dgilbert/git/qemu/migration/savevm.c:1821 #9 0x000055869651de8c in process_incoming_migration_co (opaque=0x558698c50570) at /home/dgilbert/git/qemu/migration/migration.c:336 #10 0x000055869658188a in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at /home/dgilbert/git/qemu/coroutine-ucontext.c:80 #11 0x00007f420f05df10 in __start_context () at /lib64/libc.so.6 #12 0x00007ffc40815f50 in () #13 0x0000000000000000 in () ---- Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

"name" is freed after visiting options, instead use the first NetClientState name. Adds a few assert() for clarifying and checking some impossible states. READ of size 1 at 0x602000000990 thread T0 #0 0x7f6b251c570c (/lib64/libasan.so.2+0x4770c) #1 0x5566dc380600 in qemu_find_net_clients_except net/net.c:824 #2 0x5566dc39bac7 in net_vhost_user_event net/vhost-user.c:193 #3 0x5566dbee862a in qemu_chr_be_event /home/elmarco/src/qemu/qemu-char.c:201 #4 0x5566dbef2890 in tcp_chr_disconnect /home/elmarco/src/qemu/qemu-char.c:2790 #5 0x5566dbef2d0b in tcp_chr_sync_read /home/elmarco/src/qemu/qemu-char.c:2835 #6 0x5566dbee8a99 in qemu_chr_fe_read_all /home/elmarco/src/qemu/qemu-char.c:295 #7 0x5566dc39b964 in net_vhost_user_watch net/vhost-user.c:180 #8 0x5566dc5a06c7 in qio_channel_fd_source_dispatch io/channel-watch.c:70 #9 0x7f6b1aa2ab87 in g_main_dispatch /home/elmarco/src/gnome/glib/glib/gmain.c:3154 #10 0x7f6b1aa2b9cb in g_main_context_dispatch /home/elmarco/src/gnome/glib/glib/gmain.c:3769 #11 0x5566dc475ed4 in glib_pollfds_poll /home/elmarco/src/qemu/main-loop.c:212 #12 0x5566dc476029 in os_host_main_loop_wait /home/elmarco/src/qemu/main-loop.c:257 #13 0x5566dc476165 in main_loop_wait /home/elmarco/src/qemu/main-loop.c:505 #14 0x5566dbf08d31 in main_loop /home/elmarco/src/qemu/vl.c:1932 #15 0x5566dbf16783 in main /home/elmarco/src/qemu/vl.c:4646 #16 0x7f6b180bb57f in __libc_start_main (/lib64/libc.so.6+0x2057f) #17 0x5566dbbf5348 in _start (/home/elmarco/src/qemu/x86_64-softmmu/qemu-system-x86_64+0x3f9348) 0x602000000990 is located 0 bytes inside of 5-byte region [0x602000000990,0x602000000995) freed by thread T0 here: #0 0x7f6b2521666a in __interceptor_free (/lib64/libasan.so.2+0x9866a) #1 0x7f6b1aa332a4 in g_free /home/elmarco/src/gnome/glib/glib/gmem.c:189 #2 0x5566dc5f416f in qapi_dealloc_type_str qapi/qapi-dealloc-visitor.c:134 #3 0x5566dc5f3268 in visit_type_str qapi/qapi-visit-core.c:196 #4 0x5566dc5ced58 in visit_type_Netdev_fields /home/elmarco/src/qemu/qapi-visit.c:5936 #5 0x5566dc5cef71 in visit_type_Netdev /home/elmarco/src/qemu/qapi-visit.c:5960 #6 0x5566dc381a8d in net_visit net/net.c:1049 #7 0x5566dc381c37 in net_client_init net/net.c:1076 #8 0x5566dc3839e2 in net_init_netdev net/net.c:1473 #9 0x5566dc63cc0a in qemu_opts_foreach util/qemu-option.c:1112 #10 0x5566dc383b36 in net_init_clients net/net.c:1499 #11 0x5566dbf15d86 in main /home/elmarco/src/qemu/vl.c:4397 #12 0x7f6b180bb57f in __libc_start_main (/lib64/libc.so.6+0x2057f) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

@entry

When VNC is started with '-vnc none' there will be no listener socket present. When we try to populate the VncServerInfo we'll crash accessing a NULL 'lsock' field. #0 qio_channel_socket_get_local_address (ioc=0x0, errp=errp@entry=0x7ffd5b8aa0f0) at io/channel-socket.c:33 #1 0x00007f4b9a297d6f in vnc_init_basic_info_from_server_addr (errp=0x7ffd5b8aa0f0, info=0x7f4b9d425460, ioc=<optimized out>) at ui/vnc.c:146 #2 vnc_server_info_get (vd=0x7f4b9e858000) at ui/vnc.c:223 #3 0x00007f4b9a29d318 in vnc_qmp_event (vs=0x7f4b9ef82000, vs=0x7f4b9ef82000, event=QAPI_EVENT_VNC_CONNECTED) at ui/vnc.c:279 #4 vnc_connect (vd=vd@entry=0x7f4b9e858000, sioc=sioc@entry=0x7f4b9e8b3a20, skipauth=skipauth@entry=true, websocket=websocket @entry=false) at ui/vnc.c:2994 #5 0x00007f4b9a29e8c8 in vnc_display_add_client (id=<optimized out>, csock=<optimized out>, skipauth=<optimized out>) at ui/v nc.c:3825 #6 0x00007f4b9a18d8a1 in qmp_marshal_add_client (args=<optimized out>, ret=<optimized out>, errp=0x7ffd5b8aa230) at qmp-marsh al.c:123 #7 0x00007f4b9a0b53f5 in handle_qmp_command (parser=<optimized out>, tokens=<optimized out>) at /usr/src/debug/qemu-2.6.0/mon itor.c:3922 #8 0x00007f4b9a348580 in json_message_process_token (lexer=0x7f4b9c78dfe8, input=0x7f4b9c7350e0, type=JSON_RCURLY, x=111, y=5 9) at qobject/json-streamer.c:94 #9 0x00007f4b9a35cfeb in json_lexer_feed_char (lexer=lexer@entry=0x7f4b9c78dfe8, ch=125 '}', flush=flush@entry=false) at qobj ect/json-lexer.c:310 #10 0x00007f4b9a35d0ae in json_lexer_feed (lexer=0x7f4b9c78dfe8, buffer=<optimized out>, size=<optimized out>) at qobject/json -lexer.c:360 #11 0x00007f4b9a348679 in json_message_parser_feed (parser=<optimized out>, buffer=<optimized out>, size=<optimized out>) at q object/json-streamer.c:114 #12 0x00007f4b9a0b3a1b in monitor_qmp_read (opaque=<optimized out>, buf=<optimized out>, size=<optimized out>) at /usr/src/deb ug/qemu-2.6.0/monitor.c:3938 #13 0x00007f4b9a186751 in tcp_chr_read (chan=<optimized out>, cond=<optimized out>, opaque=0x7f4b9c7add40) at qemu-char.c:2895 #14 0x00007f4b92b5c79a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #15 0x00007f4b9a2bb0c0 in glib_pollfds_poll () at main-loop.c:213 #16 os_host_main_loop_wait (timeout=<optimized out>) at main-loop.c:258 #17 main_loop_wait (nonblocking=<optimized out>) at main-loop.c:506 #18 0x00007f4b9a0835cf in main_loop () at vl.c:1934 #19 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4667 Do an upfront check for a NULL lsock and report an error to the caller, which matches behaviour from before commit 04d2529 Author: Daniel P. Berrange <berrange@redhat.com> Date: Fri Feb 27 16:20:57 2015 +0000 ui: convert VNC server to use QIOChannelSocket where getsockname() would be given a FD value -1 and thus report an error to the caller. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 1470134726-15697-2-git-send-email-berrange@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

The vnc_server_info_get will allocate the VncServerInfo struct and then call vnc_init_basic_info_from_server_addr to populate the basic fields. If this returns an error though, the qapi_free_VncServerInfo call will then crash because the VncServerInfo struct instance was not properly NULL-initialized and thus contains random stack garbage. #0 0x00007f1987c8e6f5 in raise () at /lib64/libc.so.6 #1 0x00007f1987c902fa in abort () at /lib64/libc.so.6 #2 0x00007f1987ccf600 in __libc_message () at /lib64/libc.so.6 #3 0x00007f1987cd7d4a in _int_free () at /lib64/libc.so.6 #4 0x00007f1987cdb2ac in free () at /lib64/libc.so.6 #5 0x00007f198b654f6e in g_free () at /lib64/libglib-2.0.so.0 #6 0x0000559193cdcf54 in visit_type_str (v=v@entry= 0x5591972f14b0, name=name@entry=0x559193de1e29 "host", obj=obj@entry=0x5591961dbfa0, errp=errp@entry=0x7fffd7899d80) at qapi/qapi-visit-core.c:255 #7 0x0000559193cca8f3 in visit_type_VncBasicInfo_members (v=v@entry= 0x5591972f14b0, obj=obj@entry=0x5591961dbfa0, errp=errp@entry=0x7fffd7899dc0) at qapi-visit.c:12307 #8 0x0000559193ccb523 in visit_type_VncServerInfo_members (v=v@entry= 0x5591972f14b0, obj=0x5591961dbfa0, errp=errp@entry=0x7fffd7899e00) at qapi-visit.c:12632 #9 0x0000559193ccb60b in visit_type_VncServerInfo (v=v@entry= 0x5591972f14b0, name=name@entry=0x0, obj=obj@entry=0x7fffd7899e48, errp=errp@entry=0x0) at qapi-visit.c:12658 #10 0x0000559193cb53d8 in qapi_free_VncServerInfo (obj=<optimized out>) at qapi-types.c:3970 #11 0x0000559193c1e6ba in vnc_server_info_get (vd=0x7f1951498010) at ui/vnc.c:233 #12 0x0000559193c24275 in vnc_connect (vs=0x559197b2f200, vs=0x559197b2f200, event=QAPI_EVENT_VNC_CONNECTED) at ui/vnc.c:284 #13 0x0000559193c24275 in vnc_connect (vd=vd@entry=0x7f1951498010, sioc=sioc@entry=0x559196bf9c00, skipauth=skipauth@entry=tru e, websocket=websocket@entry=false) at ui/vnc.c:3039 #14 0x0000559193c25806 in vnc_display_add_client (id=<optimized out>, csock=<optimized out>, skipauth=<optimized out>) at ui/vnc.c:3877 #15 0x0000559193a90c28 in qmp_marshal_add_client (args=<optimized out>, ret=<optimized out>, errp=0x7fffd7899f90) at qmp-marshal.c:105 #16 0x000055919399c2b7 in handle_qmp_command (parser=<optimized out>, tokens=<optimized out>) at /home/berrange/src/virt/qemu/monitor.c:3971 #17 0x0000559193ce3307 in json_message_process_token (lexer=0x559194ab0838, input=0x559194a6d940, type=JSON_RCURLY, x=111, y=1 2) at qobject/json-streamer.c:105 #18 0x0000559193cfa90d in json_lexer_feed_char (lexer=lexer@entry=0x559194ab0838, ch=125 '}', flush=flush@entry=false) at qobject/json-lexer.c:319 #19 0x0000559193cfaa1e in json_lexer_feed (lexer=0x559194ab0838, buffer=<optimized out>, size=<optimized out>) at qobject/json-lexer.c:369 #20 0x0000559193ce33c9 in json_message_parser_feed (parser=<optimized out>, buffer=<optimized out>, size=<optimized out>) at qobject/json-streamer.c:124 #21 0x000055919399a85b in monitor_qmp_read (opaque=<optimized out>, buf=<optimized out>, size=<optimized out>) at /home/berrange/src/virt/qemu/monitor.c:3987 #22 0x0000559193a87d00 in tcp_chr_read (chan=<optimized out>, cond=<optimized out>, opaque=0x559194a7d900) at qemu-char.c:2895 #23 0x00007f198b64f703 in g_main_context_dispatch () at /lib64/libglib-2.0.so.0 #24 0x0000559193c484b3 in main_loop_wait () at main-loop.c:213 #25 0x0000559193c484b3 in main_loop_wait (timeout=<optimized out>) at main-loop.c:258 #26 0x0000559193c484b3 in main_loop_wait (nonblocking=<optimized out>) at main-loop.c:506 #27 0x0000559193964c55 in main () at vl.c:1908 #28 0x0000559193964c55 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4603 This was introduced in commit 98481bf Author: Eric Blake <eblake@redhat.com> Date: Mon Oct 26 16:34:45 2015 -0600 vnc: Hoist allocation of VncBasicInfo to callers which added error reporting for vnc_init_basic_info_from_server_addr but didn't change the g_malloc calls to g_malloc0. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 1470134726-15697-3-git-send-email-berrange@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

ahci-test /x86_64/ahci/io/dma/lba28/retry triggers the following leak: Direct leak of 16 byte(s) in 1 object(s) allocated from: #0 0x7fc4b2a25e20 in malloc (/lib64/libasan.so.3+0xc6e20) #1 0x7fc4993bce58 in g_malloc (/lib64/libglib-2.0.so.0+0x4ee58) #2 0x556a187d4b34 in ahci_populate_sglist hw/ide/ahci.c:896 #3 0x556a187d8237 in ahci_dma_prepare_buf hw/ide/ahci.c:1367 #4 0x556a187b5a1a in ide_dma_cb hw/ide/core.c:844 #5 0x556a187d7eec in ahci_start_dma hw/ide/ahci.c:1333 #6 0x556a187b650b in ide_start_dma hw/ide/core.c:921 #7 0x556a187b61e6 in ide_sector_start_dma hw/ide/core.c:911 #8 0x556a187b9e26 in cmd_write_dma hw/ide/core.c:1486 #9 0x556a187bd519 in ide_exec_cmd hw/ide/core.c:2027 #10 0x556a187d71c5 in handle_reg_h2d_fis hw/ide/ahci.c:1204 #11 0x556a187d7681 in handle_cmd hw/ide/ahci.c:1254 #12 0x556a187d168a in check_cmd hw/ide/ahci.c:510 #13 0x556a187d0afc in ahci_port_write hw/ide/ahci.c:314 #14 0x556a187d105d in ahci_mem_write hw/ide/ahci.c:435 #15 0x556a1831d959 in memory_region_write_accessor /home/elmarco/src/qemu/memory.c:525 #16 0x556a1831dc35 in access_with_adjusted_size /home/elmarco/src/qemu/memory.c:591 #17 0x556a18323ce3 in memory_region_dispatch_write /home/elmarco/src/qemu/memory.c:1262 #18 0x556a1828cf67 in address_space_write_continue /home/elmarco/src/qemu/exec.c:2578 #19 0x556a1828d20b in address_space_write /home/elmarco/src/qemu/exec.c:2635 #20 0x556a1828d92b in address_space_rw /home/elmarco/src/qemu/exec.c:2737 #21 0x556a1828daf7 in cpu_physical_memory_rw /home/elmarco/src/qemu/exec.c:2746 #22 0x556a183068d3 in cpu_physical_memory_write /home/elmarco/src/qemu/include/exec/cpu-common.h:72 #23 0x556a18308194 in qtest_process_command /home/elmarco/src/qemu/qtest.c:382 #24 0x556a18309999 in qtest_process_inbuf /home/elmarco/src/qemu/qtest.c:573 #25 0x556a18309a4a in qtest_read /home/elmarco/src/qemu/qtest.c:585 #26 0x556a18598b85 in qemu_chr_be_write_impl /home/elmarco/src/qemu/qemu-char.c:387 #27 0x556a18598c52 in qemu_chr_be_write /home/elmarco/src/qemu/qemu-char.c:399 #28 0x556a185a2afa in tcp_chr_read /home/elmarco/src/qemu/qemu-char.c:2902 #29 0x556a18cbaf52 in qio_channel_fd_source_dispatch io/channel-watch.c:84 Follow John Snow recommendation: Everywhere else ncq_err is used, it is accompanied by a list cleanup except for ncq_cb, which is the case you are fixing here. Move the sglist destruction inside of ncq_err and then delete it from the other two locations to keep it tidy. Call dma_buf_commit in ide_dma_cb after the early return. Though, this is also a little wonky because this routine does more than clear the list, but it is at the moment the centralized "we're done with the sglist" function and none of the other side effects that occur in dma_buf_commit will interfere with the reset that occurs from ide_restart_bh, I think Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>

Since aa5cb7f, the chardevs are being cleaned up when leaving qemu. However, the monitor has still references to them, which may lead to crashes when running atexit() and trying to send monitor events: #0 0x00007fffdb18f6f5 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 #1 0x00007fffdb1912fa in __GI_abort () at abort.c:89 #2 0x0000555555c263e7 in error_exit (err=22, msg=0x555555d47980 <__func__.13537> "qemu_mutex_lock") at util/qemu-thread-posix.c:39 #3 0x0000555555c26488 in qemu_mutex_lock (mutex=0x5555567a2420) at util/qemu-thread-posix.c:66 #4 0x00005555558c52db in qemu_chr_fe_write (s=0x5555567a2420, buf=0x55555740dc40 "{\"timestamp\": {\"seconds\": 1470041716, \"microseconds\": 989699}, \"event\": \"SPICE_DISCONNECTED\", \"data\": {\"server\": {\"port\": \"5900\", \"family\": \"ipv4\", \"host\": \"127.0.0.1\"}, \"client\": {\"port\": \"40272\", \"f"..., len=240) at qemu-char.c:280 #5 0x0000555555787cad in monitor_flush_locked (mon=0x5555567bd9e0) at /home/elmarco/src/qemu/monitor.c:311 #6 0x0000555555787e46 in monitor_puts (mon=0x5555567bd9e0, str=0x5555567a44ef "") at /home/elmarco/src/qemu/monitor.c:353 #7 0x00005555557880fe in monitor_json_emitter (mon=0x5555567bd9e0, data=0x5555567c73a0) at /home/elmarco/src/qemu/monitor.c:401 #8 0x00005555557882d2 in monitor_qapi_event_emit (event=QAPI_EVENT_SPICE_DISCONNECTED, qdict=0x5555567c73a0) at /home/elmarco/src/qemu/monitor.c:472 #9 0x000055555578838f in monitor_qapi_event_queue (event=QAPI_EVENT_SPICE_DISCONNECTED, qdict=0x5555567c73a0, errp=0x7fffffffca88) at /home/elmarco/src/qemu/monitor.c:497 #10 0x0000555555c15541 in qapi_event_send_spice_disconnected (server=0x5555571139d0, client=0x5555570d0db0, errp=0x5555566c0428 <error_abort>) at qapi-event.c:1038 #11 0x0000555555b11bc6 in channel_event (event=3, info=0x5555570d6c00) at ui/spice-core.c:248 #12 0x00007fffdcc9983a in adapter_channel_event (event=3, info=0x5555570d6c00) at reds.c:120 #13 0x00007fffdcc99a25 in reds_handle_channel_event (reds=0x5555567a9d60, event=3, info=0x5555570d6c00) at reds.c:324 #14 0x00007fffdcc7d4c4 in main_dispatcher_self_handle_channel_event (self=0x5555567b28b0, event=3, info=0x5555570d6c00) at main-dispatcher.c:175 #15 0x00007fffdcc7d5b1 in main_dispatcher_channel_event (self=0x5555567b28b0, event=3, info=0x5555570d6c00) at main-dispatcher.c:194 #16 0x00007fffdcca7674 in reds_stream_push_channel_event (s=0x5555570d9910, event=3) at reds-stream.c:354 #17 0x00007fffdcca749b in reds_stream_free (s=0x5555570d9910) at reds-stream.c:323 #18 0x00007fffdccb5dad in snd_disconnect_channel (channel=0x5555576a89a0) at sound.c:229 #19 0x00007fffdccb9e57 in snd_detach_common (worker=0x555557739720) at sound.c:1589 #20 0x00007fffdccb9f0e in snd_detach_playback (sin=0x5555569fe3f8) at sound.c:1602 #21 0x00007fffdcca3373 in spice_server_remove_interface (sin=0x5555569fe3f8) at reds.c:3387 #22 0x00005555558ff6e2 in line_out_fini (hw=0x5555569fe370) at audio/spiceaudio.c:152 #23 0x00005555558f909e in audio_atexit () at audio/audio.c:1754 #24 0x00007fffdb1941e8 in __run_exit_handlers (status=0, listp=0x7fffdb5175d8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:82 #25 0x00007fffdb194235 in __GI_exit (status=<optimized out>) at exit.c:104 #26 0x00007fffdb17b738 in __libc_start_main (main=0x5555558d7874 <main>, argc=67, argv=0x7fffffffcf48, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffcf38) at ../csu/libc-start.c:323 Add a monitor_cleanup() functions to remove all the monitors before cleaning up the chardev. Note that we are "losing" some events that used to be sent during atexit(). Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20160801112343.29082-2-marcandre.lureau@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Running cpuid instructions with a simple run like: i386-linux-user/qemu-i386 tests/tcg/sha1-i386 Results in the following assert: #0 0x00007ffff64246f5 in raise () from /lib64/libc.so.6 #1 0x00007ffff64262fa in abort () from /lib64/libc.so.6 #2 0x00007ffff7937ec5 in g_assertion_message () from /lib64/libglib-2.0.so.0 #3 0x00007ffff7937f5a in g_assertion_message_expr () from /lib64/libglib-2.0.so.0 #4 0x000055555561b54c in apicid_bitwidth_for_count (count=0) at /home/elmarco/src/qemu/include/hw/i386/topology.h:58 #5 0x000055555561b58a in apicid_smt_width (nr_cores=0, nr_threads=0) at /home/elmarco/src/qemu/include/hw/i386/topology.h:67 #6 0x000055555561b5c3 in apicid_core_offset (nr_cores=0, nr_threads=0) at /home/elmarco/src/qemu/include/hw/i386/topology.h:82 #7 0x000055555561b5e3 in apicid_pkg_offset (nr_cores=0, nr_threads=0) at /home/elmarco/src/qemu/include/hw/i386/topology.h:89 #8 0x000055555561dd86 in cpu_x86_cpuid (env=0x555557999550, index=4, count=3, eax=0x7fffffffcae8, ebx=0x7fffffffcaec, ecx=0x7fffffffcaf0, edx=0x7fffffffcaf4) at /home/elmarco/src/qemu/target-i386/cpu.c:2405 #9 0x0000555555638e8e in helper_cpuid (env=0x555557999550) at /home/elmarco/src/qemu/target-i386/misc_helper.c:106 #10 0x000055555599dc5e in static_code_gen_buffer () #11 0x00005555555952f8 in cpu_tb_exec (cpu=0x5555579912d0, itb=0x7ffff4371ab0) at /home/elmarco/src/qemu/cpu-exec.c:166 #12 0x0000555555595c8e in cpu_loop_exec_tb (cpu=0x5555579912d0, tb=0x7ffff4371ab0, last_tb=0x7fffffffd088, tb_exit=0x7fffffffd084, sc=0x7fffffffd0a0) at /home/elmarco/src/qemu/cpu-exec.c:517 #13 0x0000555555595e50 in cpu_exec (cpu=0x5555579912d0) at /home/elmarco/src/qemu/cpu-exec.c:612 #14 0x00005555555c065b in cpu_loop (env=0x555557999550) at /home/elmarco/src/qemu/linux-user/main.c:297 #15 0x00005555555c25b2 in main (argc=2, argv=0x7fffffffd848, envp=0x7fffffffd860) at /home/elmarco/src/qemu/linux-user/main.c:4803 The fields are set in qemu_init_vcpu() with softmmu, but it's a stub with linux-user. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

Commit 0ed93d8 ("linux-aio: process completions from ioq_submit()") added an optimization that processes completions each time ioq_submit() returns with requests in flight. This commit introduces a "Co-routine re-entered recursively" error which can be triggered with -drive format=qcow2,aio=native. Fam Zheng <famz@redhat.com>, Kevin Wolf <kwolf@redhat.com>, and I debugged the following backtrace: (gdb) bt #0 0x00007ffff0a046f5 in raise () at /lib64/libc.so.6 #1 0x00007ffff0a062fa in abort () at /lib64/libc.so.6 #2 0x0000555555ac0013 in qemu_coroutine_enter (co=0x5555583464d0) at util/qemu-coroutine.c:113 #3 0x0000555555a4b663 in qemu_laio_process_completions (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:218 #4 0x0000555555a4b874 in ioq_submit (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:331 #5 0x0000555555a4ba12 in laio_do_submit (fd=fd@entry=13, laiocb=laiocb@entry=0x555559d38ae0, offset=offset@entry=2932727808, type=type@entry=1) at block/linux-aio.c:383 #6 0x0000555555a4bbd3 in laio_co_submit (bs=<optimized out>, s=0x555557e2f7f0, fd=13, offset=2932727808, qiov=0x555559d38e20, type=1) at block/linux-aio.c:402 #7 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x55555663bcb0, offset=offset@entry=2932727808, bytes=bytes@entry=8192, qiov=qiov@entry=0x555559d38e20, flags=0) at block/io.c:804 #8 0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x55555663bcb0, req=req@entry=0x555559d38d20, offset=offset@entry=2932727808, bytes=bytes@entry=8192, align=align@entry=512, qiov=qiov@entry=0x555559d38e20, flags=0) at block/io.c:1041 #9 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=2932727808, bytes=8192, qiov=qiov@entry=0x555559d38e20, flags=flags@entry=0) at block/io.c:1133 #10 0x0000555555a29629 in qcow2_co_preadv (bs=0x555556635890, offset=6178725888, bytes=8192, qiov=0x555557527840, flags=<optimized out>) at block/qcow2.c:1509 #11 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x555556635890, offset=offset@entry=6178725888, bytes=bytes@entry=8192, qiov=qiov@entry=0x555557527840, flags=0) at block/io.c:804 #12 0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x555556635890, req=req@entry=0x555559d39000, offset=offset@entry=6178725888, bytes=bytes@entry=8192, align=align@entry=1, qiov=qiov@entry=0x555557527840, flags=0) at block/io.c:1041 #13 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=offset@entry=6178725888, bytes=bytes@entry=8192, qiov=qiov@entry=0x555557527840, flags=flags@entry=0) at block/io.c:1133 #14 0x0000555555a4515a in blk_co_preadv (blk=0x5555566356d0, offset=6178725888, bytes=8192, qiov=0x555557527840, flags=0) at block/block-backend.c:783 #15 0x0000555555a45266 in blk_aio_read_entry (opaque=0x5555577025e0) at block/block-backend.c:991 #16 0x0000555555ac0cfa in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:78 It turned out that re-entrant ioq_submit() and completion processing between three requests caused this error. The following check is not sufficient to prevent recursively entering coroutines: if (laiocb->co != qemu_coroutine_self()) { qemu_coroutine_enter(laiocb->co); } As the following coroutine backtrace shows, not just the current coroutine (self) can be entered. There might also be other coroutines that are currently entered and transferred control due to the qcow2 lock (CoMutex): (gdb) qemu coroutine 0x5555583464d0 #0 0x0000555555ac0c90 in qemu_coroutine_switch (from_=from_@entry=0x5555583464d0, to_=to_@entry=0x5555572f9890, action=action@entry=COROUTINE_ENTER) at util/coroutine-ucontext.c:175 #1 0x0000555555abfe54 in qemu_coroutine_enter (co=0x5555572f9890) at util/qemu-coroutine.c:117 #2 0x0000555555ac031c in qemu_co_queue_run_restart (co=co@entry=0x5555583462c0) at util/qemu-coroutine-lock.c:60 #3 0x0000555555abfe5e in qemu_coroutine_enter (co=0x5555583462c0) at util/qemu-coroutine.c:119 #4 0x0000555555a4b663 in qemu_laio_process_completions (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:218 #5 0x0000555555a4b874 in ioq_submit (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:331 #6 0x0000555555a4ba12 in laio_do_submit (fd=fd@entry=13, laiocb=laiocb@entry=0x55555a338b40, offset=offset@entry=2911477760, type=type@entry=1) at block/linux-aio.c:383 #7 0x0000555555a4bbd3 in laio_co_submit (bs=<optimized out>, s=0x555557e2f7f0, fd=13, offset=2911477760, qiov=0x55555a338e80, type=1) at block/linux-aio.c:402 #8 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x55555663bcb0, offset=offset@entry=2911477760, bytes=bytes@entry=8192, qiov=qiov@entry=0x55555a338e80, flags=0) at block/io.c:804 #9 0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x55555663bcb0, req=req@entry=0x55555a338d80, offset=offset@entry=2911477760, bytes=bytes@entry=8192, align=align@entry=512, qiov=qiov@entry=0x55555a338e80, flags=0) at block/io.c:1041 #10 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=2911477760, bytes=8192, qiov=qiov@entry=0x55555a338e80, flags=flags@entry=0) at block/io.c:1133 #11 0x0000555555a29629 in qcow2_co_preadv (bs=0x555556635890, offset=6157475840, bytes=8192, qiov=0x5555575df720, flags=<optimized out>) at block/qcow2.c:1509 #12 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x555556635890, offset=offset@entry=6157475840, bytes=bytes@entry=8192, qiov=qiov@entry=0x5555575df720, flags=0) at block/io.c:804 #13 0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x555556635890, req=req@entry=0x55555a339060, offset=offset@entry=6157475840, bytes=bytes@entry=8192, align=align@entry=1, qiov=qiov@entry=0x5555575df720, flags=0) at block/io.c:1041 #14 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=offset@entry=6157475840, bytes=bytes@entry=8192, qiov=qiov@entry=0x5555575df720, flags=flags@entry=0) at block/io.c:1133 #15 0x0000555555a4515a in blk_co_preadv (blk=0x5555566356d0, offset=6157475840, bytes=8192, qiov=0x5555575df720, flags=0) at block/block-backend.c:783 #16 0x0000555555a45266 in blk_aio_read_entry (opaque=0x555557231aa0) at block/block-backend.c:991 #17 0x0000555555ac0cfa in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:78 Use the new qemu_coroutine_entered() function instead of comparing against qemu_coroutine_self(). This is correct because: 1. If a coroutine is not entered then it must have yielded to wait for I/O completion. It is therefore safe to enter. 2. If a coroutine is entered then it must be in ioq_submit()/qemu_laio_process_completions() because otherwise it would be yielded while waiting for I/O completion. Therefore it will check laio->ret and return from ioq_submit() instead of yielding, i.e. it's guaranteed not to hang. Reported-by: Fam Zheng <famz@redhat.com> Tested-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1474989516-18255-4-git-send-email-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

The vhost-user & colo code is poking at the QemuOpts instance in the CharDriverState struct, not realizing that it is valid for this to be NULL. e.g. the following crash shows a codepath where it will be NULL: Program terminated with signal SIGSEGV, Segmentation fault. #0 0x000055baf6ab4adc in qemu_opt_foreach (opts=0x0, func=0x55baf696b650 <net_vhost_chardev_opts>, opaque=0x7ffc51368c00, errp=0x7ffc51368e48) at util/qemu-option.c:617 617 QTAILQ_FOREACH(opt, &opts->head, next) { [Current thread is 1 (Thread 0x7f1d4970bb40 (LWP 6603))] (gdb) bt #0 0x000055baf6ab4adc in qemu_opt_foreach (opts=0x0, func=0x55baf696b650 <net_vhost_chardev_opts>, opaque=0x7ffc51368c00, errp=0x7ffc51368e48) at util/qemu-option.c:617 #1 0x000055baf696b7da in net_vhost_parse_chardev (opts=0x55baf8ff9260, errp=0x7ffc51368e48) at net/vhost-user.c:314 #2 0x000055baf696b985 in net_init_vhost_user (netdev=0x55baf8ff9250, name=0x55baf879d270 "hostnet2", peer=0x0, errp=0x7ffc51368e48) at net/vhost-user.c:360 #3 0x000055baf6960216 in net_client_init1 (object=0x55baf8ff9250, is_netdev=true, errp=0x7ffc51368e48) at net/net.c:1051 #4 0x000055baf6960518 in net_client_init (opts=0x55baf776e7e0, is_netdev=true, errp=0x7ffc51368f00) at net/net.c:1108 #5 0x000055baf696083f in netdev_add (opts=0x55baf776e7e0, errp=0x7ffc51368f00) at net/net.c:1186 #6 0x000055baf69608c7 in qmp_netdev_add (qdict=0x55baf7afaf60, ret=0x7ffc51368f50, errp=0x7ffc51368f48) at net/net.c:1205 #7 0x000055baf6622135 in handle_qmp_command (parser=0x55baf77fb590, tokens=0x7f1d24011960) at /path/to/qemu.git/monitor.c:3978 #8 0x000055baf6a9d099 in json_message_process_token (lexer=0x55baf77fb598, input=0x55baf75acd20, type=JSON_RCURLY, x=113, y=19) at qobject/json-streamer.c:105 #9 0x000055baf6abf7aa in json_lexer_feed_char (lexer=0x55baf77fb598, ch=125 '}', flush=false) at qobject/json-lexer.c:319 #10 0x000055baf6abf8f2 in json_lexer_feed (lexer=0x55baf77fb598, buffer=0x7ffc51369170 "}R\204\367\272U", size=1) at qobject/json-lexer.c:369 #11 0x000055baf6a9d13c in json_message_parser_feed (parser=0x55baf77fb590, buffer=0x7ffc51369170 "}R\204\367\272U", size=1) at qobject/json-streamer.c:124 #12 0x000055baf66221f7 in monitor_qmp_read (opaque=0x55baf77fb530, buf=0x7ffc51369170 "}R\204\367\272U", size=1) at /path/to/qemu.git/monitor.c:3994 #13 0x000055baf6757014 in qemu_chr_be_write_impl (s=0x55baf7610a40, buf=0x7ffc51369170 "}R\204\367\272U", len=1) at qemu-char.c:387 #14 0x000055baf6757076 in qemu_chr_be_write (s=0x55baf7610a40, buf=0x7ffc51369170 "}R\204\367\272U", len=1) at qemu-char.c:399 #15 0x000055baf675b3b0 in tcp_chr_read (chan=0x55baf90244b0, cond=G_IO_IN, opaque=0x55baf7610a40) at qemu-char.c:2927 #16 0x000055baf6a5d655 in qio_channel_fd_source_dispatch (source=0x55baf7610df0, callback=0x55baf675b25a <tcp_chr_read>, user_data=0x55baf7610a40) at io/channel-watch.c:84 #17 0x00007f1d3e80cbbd in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0 #18 0x000055baf69d3720 in glib_pollfds_poll () at main-loop.c:213 #19 0x000055baf69d37fd in os_host_main_loop_wait (timeout=126000000) at main-loop.c:258 #20 0x000055baf69d38ad in main_loop_wait (nonblocking=0) at main-loop.c:506 #21 0x000055baf676587b in main_loop () at vl.c:1908 #22 0x000055baf676d3bf in main (argc=101, argv=0x7ffc5136a6c8, envp=0x7ffc5136a9f8) at vl.c:4604 (gdb) p opts $1 = (QemuOpts *) 0x0 The crash occurred when attaching vhost-user net via QMP: { "execute": "chardev-add", "arguments": { "id": "charnet2", "backend": { "type": "socket", "data": { "addr": { "type": "unix", "data": { "path": "/var/run/openvswitch/vhost-user1" } }, "wait": false, "server": false } } }, "id": "libvirt-19" } { "return": { }, "id": "libvirt-19" } { "execute": "netdev_add", "arguments": { "type": "vhost-user", "chardev": "charnet2", "id": "hostnet2" }, "id": "libvirt-20" } Code using chardevs should not be poking at the internals of the CharDriverState struct. What vhost-user wants is a chardev that is operating as reconnectable network service, along with the ability to do FD passing over the connection. The colo code simply wants a network service. Add a feature concept to the char drivers so that chardev users can query the actual features they wish to have supported. The QemuOpts member is removed to prevent future mistakes in this area. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

If, once the kernel has booted, we try to remove a memory hotplugged while the kernel was not started, QEMU crashes on an assert: qemu-system-ppc64: hw/virtio/vhost.c:651: vhost_commit: Assertion `r >= 0' failed. ... #4 in vhost_commit #5 in memory_region_transaction_commit #6 in pc_dimm_memory_unplug #7 in spapr_memory_unplug #8 spapr_machine_device_unplug #9 in hotplug_handler_unplug #10 in spapr_lmb_release #11 in detach #12 in set_allocation_state #13 in rtas_set_indicator ... If we take a closer look to the guest kernel log, we can see when we try to unplug the memory: pseries-hotplug-mem: Attempting to hot-add 4 LMB(s) What happens: 1- The kernel has ignored the memory hotplug event because it was not started when it was generated. 2- When we hot-unplug the memory, QEMU starts to remove the memory, generates an hot-unplug event, and signals the kernel of the incoming new event 3- as the kernel is started, on the QEMU signal, it reads the event list, decodes the hotplug event and tries to finish the hotplugging. 4- QEMU receive the the hotplug notification while it is trying to hot-unplug the memory. This moves the memory DRC to an invalid state This patch prevents this by not allowing to set the allocation state to USABLE while the DRC is awaiting release. RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1432382 Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

During block job completion, nothing is preventing block_job_defer_to_main_loop_bh from being called in a nested aio_poll(), which is a trouble, such as in this code path: qmp_block_commit commit_active_start bdrv_reopen bdrv_reopen_multiple bdrv_reopen_prepare bdrv_flush aio_poll aio_bh_poll aio_bh_call block_job_defer_to_main_loop_bh stream_complete bdrv_reopen block_job_defer_to_main_loop_bh is the last step of the stream job, which should have been "paused" by the bdrv_drained_begin/end in bdrv_reopen_multiple, but it is not done because it's in the form of a main loop BH. Similar to why block jobs should be paused between drained_begin and drained_end, BHs they schedule must be excluded as well. To achieve this, this patch forces draining the BH in BDRV_POLL_WHILE. As a side effect this fixes a hang in block_job_detach_aio_context during system_reset when a block job is ready: #0 0x0000555555aa79f3 in bdrv_drain_recurse #1 0x0000555555aa825d in bdrv_drained_begin #2 0x0000555555aa8449 in bdrv_drain #3 0x0000555555a9c356 in blk_drain #4 0x0000555555aa3cfd in mirror_drain #5 0x0000555555a66e11 in block_job_detach_aio_context #6 0x0000555555a62f4d in bdrv_detach_aio_context #7 0x0000555555a63116 in bdrv_set_aio_context #8 0x0000555555a9d326 in blk_set_aio_context #9 0x00005555557e38da in virtio_blk_data_plane_stop #10 0x00005555559f9d5f in virtio_bus_stop_ioeventfd #11 0x00005555559fa49b in virtio_bus_stop_ioeventfd #12 0x00005555559f6a18 in virtio_pci_stop_ioeventfd #13 0x00005555559f6a18 in virtio_pci_reset #14 0x00005555559139a9 in qdev_reset_one #15 0x0000555555916738 in qbus_walk_children #16 0x0000555555913318 in qdev_walk_children #17 0x0000555555916738 in qbus_walk_children #18 0x00005555559168ca in qemu_devices_reset #19 0x000055555581fcbb in pc_machine_reset #20 0x00005555558a4d96 in qemu_system_reset #21 0x000055555577157a in main_loop_should_exit #22 0x000055555577157a in main_loop #23 0x000055555577157a in main The rationale is that the loop in block_job_detach_aio_context cannot make any progress in pausing/completing the job, because bs->in_flight is 0, so bdrv_drain doesn't process the block_job_defer_to_main_loop BH. With this patch, it does. Reported-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170418143044.12187-3-famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Tested-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>

…ered Submission of requests on linux aio is a bit tricky and can lead to requests completions on submission path: 44713c9 ("linux-aio: Handle io_submit() failure gracefully") 0ed93d8 ("linux-aio: process completions from ioq_submit()") That means that any coroutine which has been yielded in order to wait for completion can be resumed from submission path and be eventually terminated (freed). The following use-after-free crash was observed when IO throttling was enabled: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f5813dff700 (LWP 56417)] virtqueue_unmap_sg (elem=0x7f5804009a30, len=1, vq=<optimized out>) at virtio.c:252 (gdb) bt #0 virtqueue_unmap_sg (elem=0x7f5804009a30, len=1, vq=<optimized out>) at virtio.c:252 ^^^^^^^^^^^^^^ remember the address #1 virtqueue_fill (vq=0x5598b20d21b0, elem=0x7f5804009a30, len=1, idx=0) at virtio.c:282 #2 virtqueue_push (vq=0x5598b20d21b0, elem=elem@entry=0x7f5804009a30, len=<optimized out>) at virtio.c:308 #3 virtio_blk_req_complete (req=req@entry=0x7f5804009a30, status=status@entry=0 '\000') at virtio-blk.c:61 #4 virtio_blk_rw_complete (opaque=<optimized out>, ret=0) at virtio-blk.c:126 #5 blk_aio_complete (acb=0x7f58040068d0) at block-backend.c:923 #6 coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at coroutine-ucontext.c:78 (gdb) p * elem $8 = {index = 77, out_num = 2, in_num = 1, in_addr = 0x7f5804009ad8, out_addr = 0x7f5804009ae0, in_sg = 0x0, out_sg = 0x7f5804009a50} ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 'in_sg' and 'out_sg' are invalid. e.g. it is impossible that 'in_sg' is zero, instead its value must be equal to: (gdb) p/x 0x7f5804009ad8 + sizeof(elem->in_addr[0]) + 2 * sizeof(elem->out_addr[0]) $26 = 0x7f5804009af0 Seems 'elem' was corrupted. Meanwhile another thread raised an abort: Thread 12 (Thread 0x7f57f2ffd700 (LWP 56426)): #0 raise () from /lib/x86_64-linux-gnu/libc.so.6 #1 abort () from /lib/x86_64-linux-gnu/libc.so.6 #2 qemu_coroutine_enter (co=0x7f5804009af0) at qemu-coroutine.c:113 #3 qemu_co_queue_run_restart (co=0x7f5804009a30) at qemu-coroutine-lock.c:60 #4 qemu_coroutine_enter (co=0x7f5804009a30) at qemu-coroutine.c:119 ^^^^^^^^^^^^^^^^^^ WTF?? this is equal to elem from crashed thread #5 qemu_co_queue_run_restart (co=0x7f57e7f16ae0) at qemu-coroutine-lock.c:60 #6 qemu_coroutine_enter (co=0x7f57e7f16ae0) at qemu-coroutine.c:119 #7 qemu_co_queue_run_restart (co=0x7f5807e112a0) at qemu-coroutine-lock.c:60 #8 qemu_coroutine_enter (co=0x7f5807e112a0) at qemu-coroutine.c:119 #9 qemu_co_queue_run_restart (co=0x7f5807f17820) at qemu-coroutine-lock.c:60 #10 qemu_coroutine_enter (co=0x7f5807f17820) at qemu-coroutine.c:119 #11 qemu_co_queue_run_restart (co=0x7f57e7f18e10) at qemu-coroutine-lock.c:60 #12 qemu_coroutine_enter (co=0x7f57e7f18e10) at qemu-coroutine.c:119 #13 qemu_co_enter_next (queue=queue@entry=0x5598b1e742d0) at qemu-coroutine-lock.c:106 #14 timer_cb (blk=0x5598b1e74280, is_write=<optimized out>) at throttle-groups.c:419 Crash can be explained by access of 'co' object from the loop inside qemu_co_queue_run_restart(): while ((next = QSIMPLEQ_FIRST(&co->co_queue_wakeup))) { QSIMPLEQ_REMOVE_HEAD(&co->co_queue_wakeup, co_queue_next); ^^^^^^^^^^^^^^^^^^^^ on each iteration 'co' is accessed, but 'co' can be already freed qemu_coroutine_enter(next); } When 'next' coroutine is resumed (entered) it can in its turn resume 'co', and eventually free it. That's why we see 'co' (which was freed) has the same address as 'elem' from the first backtrace. The fix is obvious: use temporary queue and do not touch coroutine after first qemu_coroutine_enter() is invoked. The issue is quite rare and happens every ~12 hours on very high IO and CPU load (building linux kernel with -j512 inside guest) when IO throttling is enabled. With the fix applied guest is running ~35 hours and is still alive so far. Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20170601160847.23720-1-roman.penyaev@profitbricks.com Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Fam Zheng <famz@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: qemu-devel@nongnu.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

If we create a thread with QEMU_THREAD_DETACHED mode, QEMU may get a segfault with low probability. The backtrace is: #0 0x00007f46c60291d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x00007f46c602a8c8 in __GI_abort () at abort.c:90 #2 0x00000000008543c9 in PAT_abort () #3 0x000000000085140d in patchIllInsHandler () #4 <signal handler called> #5 pthread_detach (th=139933037614848) at pthread_detach.c:50 #6 0x0000000000829759 in qemu_thread_create (thread=thread@entry=0x7ffdaa8205e0, name=name@entry=0x94d94a "io-task-worker", start_routine=start_routine@entry=0x7eb9a0 <qio_task_thread_worker>, arg=arg@entry=0x3f5cf70, mode=mode@entry=1) at util/qemu_thread_posix.c:512 #7 0x00000000007ebc96 in qio_task_run_in_thread (task=0x31db2c0, worker=worker@entry=0x7e7e40 <qio_channel_socket_connect_worker>, opaque=0xcd23380, destroy=0x7f1180 <qapi_free_SocketAddress>) at io/task.c:141 #8 0x00000000007e7f33 in qio_channel_socket_connect_async (ioc=ioc@entry=0x626c0b0, addr=<optimized out>, callback=callback@entry=0x55e080 <qemu_chr_socket_connected>, opaque=opaque@entry=0x42862c0, destroy=destroy@entry=0x0) at io/channel_socket.c:194 #9 0x000000000055bdd1 in socket_reconnect_timeout (opaque=0x42862c0) at qemu_char.c:4744 #10 0x00007f46c72483b3 in g_timeout_dispatch () from /usr/lib64/libglib-2.0.so.0 #11 0x00007f46c724799a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0 #12 0x000000000076c646 in glib_pollfds_poll () at main_loop.c:228 #13 0x000000000076c6eb in os_host_main_loop_wait (timeout=348000000) at main_loop.c:273 #14 0x000000000076c815 in main_loop_wait (nonblocking=nonblocking@entry=0) at main_loop.c:521 #15 0x000000000056a511 in main_loop () at vl.c:2076 #16 0x0000000000420705 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4940 The cause of this problem is a glibc bug; for more information, see https://sourceware.org/bugzilla/show_bug.cgi?id=19951. The solution for this bug is to use pthread_attr_setdetachstate. There is a similar issue with pthread_setname_np, which is moved from creating thread to created thread. Signed-off-by: linzhecheng <linzhecheng@huawei.com> Message-Id: <20171128044656.10592-1-linzhecheng@huawei.com> Reviewed-by: Fam Zheng <famz@redhat.com> [Simplify the code by removing qemu_thread_set_name, and free the arguments before invoking the start routine. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Fixes leaks such as: Direct leak of 2 byte(s) in 1 object(s) allocated from: #0 0x7eff58beb850 in malloc (/lib64/libasan.so.4+0xde850) #1 0x7eff57942f0c in g_malloc ../glib/gmem.c:94 #2 0x7eff579431cf in g_malloc_n ../glib/gmem.c:331 #3 0x7eff5795f6eb in g_strdup ../glib/gstrfuncs.c:363 #4 0x55db720f1d46 in readline_hist_add /home/elmarco/src/qq/util/readline.c:258 #5 0x55db720f2d34 in readline_handle_byte /home/elmarco/src/qq/util/readline.c:387 #6 0x55db71539d00 in monitor_read /home/elmarco/src/qq/monitor.c:3896 #7 0x55db71f9be35 in qemu_chr_be_write_impl /home/elmarco/src/qq/chardev/char.c:167 #8 0x55db71f9bed3 in qemu_chr_be_write /home/elmarco/src/qq/chardev/char.c:179 #9 0x55db71fa013c in fd_chr_read /home/elmarco/src/qq/chardev/char-fd.c:66 #10 0x55db71fe18a8 in qio_channel_fd_source_dispatch /home/elmarco/src/qq/io/channel-watch.c:84 #11 0x7eff5793a90b in g_main_dispatch ../glib/gmain.c:3182 #12 0x7eff5793b7ac in g_main_context_dispatch ../glib/gmain.c:3847 #13 0x55db720af3bd in glib_pollfds_poll /home/elmarco/src/qq/util/main-loop.c:214 #14 0x55db720af505 in os_host_main_loop_wait /home/elmarco/src/qq/util/main-loop.c:261 #15 0x55db720af6d6 in main_loop_wait /home/elmarco/src/qq/util/main-loop.c:515 #16 0x55db7184e0de in main_loop /home/elmarco/src/qq/vl.c:1995 #17 0x55db7185e956 in main /home/elmarco/src/qq/vl.c:4914 #18 0x7eff4ea17039 in __libc_start_main (/lib64/libc.so.6+0x21039) (while at it, use g_new0(ReadLineState), it's a bit easier to read) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20180104160523.22995-11-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ASAN complains about: ==8856==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffd8a1fe168 at pc 0x561136cb4451 bp 0x7ffd8a1fe130 sp 0x7ffd8a1fd8e0 READ of size 16 at 0x7ffd8a1fe168 thread T0 #0 0x561136cb4450 in __asan_memcpy (/home/elmarco/src/qq/build/tests/test-crypto-ivgen+0x110450) #1 0x561136d2a6a7 in qcrypto_ivgen_essiv_calculate /home/elmarco/src/qq/crypto/ivgen-essiv.c:83:5 #2 0x561136d29af8 in qcrypto_ivgen_calculate /home/elmarco/src/qq/crypto/ivgen.c:72:12 #3 0x561136d07c8e in test_ivgen /home/elmarco/src/qq/tests/test-crypto-ivgen.c:148:5 #4 0x7f77772c3b04 in test_case_run /home/elmarco/src/gnome/glib/builddir/../glib/gtestutils.c:2237 #5 0x7f77772c3ec4 in g_test_run_suite_internal /home/elmarco/src/gnome/glib/builddir/../glib/gtestutils.c:2321 #6 0x7f77772c3f6d in g_test_run_suite_internal /home/elmarco/src/gnome/glib/builddir/../glib/gtestutils.c:2333 #7 0x7f77772c3f6d in g_test_run_suite_internal /home/elmarco/src/gnome/glib/builddir/../glib/gtestutils.c:2333 #8 0x7f77772c3f6d in g_test_run_suite_internal /home/elmarco/src/gnome/glib/builddir/../glib/gtestutils.c:2333 #9 0x7f77772c4184 in g_test_run_suite /home/elmarco/src/gnome/glib/builddir/../glib/gtestutils.c:2408 #10 0x7f77772c2e0d in g_test_run /home/elmarco/src/gnome/glib/builddir/../glib/gtestutils.c:1674 #11 0x561136d0799b in main /home/elmarco/src/qq/tests/test-crypto-ivgen.c:173:12 #12 0x7f77756e6039 in __libc_start_main (/lib64/libc.so.6+0x21039) #13 0x561136c13d89 in _start (/home/elmarco/src/qq/build/tests/test-crypto-ivgen+0x6fd89) Address 0x7ffd8a1fe168 is located in stack of thread T0 at offset 40 in frame #0 0x561136d2a40f in qcrypto_ivgen_essiv_calculate /home/elmarco/src/qq/crypto/ivgen-essiv.c:76 This frame has 1 object(s): [32, 40) 'sector.addr' <== Memory access at offset 40 overflows this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext (longjmp and C++ exceptions *are* supported) SUMMARY: AddressSanitizer: stack-buffer-overflow (/home/elmarco/src/qq/build/tests/test-crypto-ivgen+0x110450) in __asan_memcpy Shadow bytes around the buggy address: 0x100031437bd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x100031437be0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x100031437bf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x100031437c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x100031437c10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>0x100031437c20: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00[f3]f3 f3 0x100031437c30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x100031437c40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x100031437c50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x100031437c60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x100031437c70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb It looks like the rest of the code copes with ndata being larger than sizeof(sector), so limit the memcpy() range. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <20180104160523.22995-13-marcandre.lureau@redhat.com> Tested-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Direct leak of 160 byte(s) in 4 object(s) allocated from: #0 0x55ed7678cda8 in calloc (/home/elmarco/src/qq/build/x86_64-softmmu/qemu-system-x86_64+0x797da8) #1 0x7f3f5e725f75 in g_malloc0 /home/elmarco/src/gnome/glib/builddir/../glib/gmem.c:124 #2 0x55ed778aa3a7 in query_option_descs /home/elmarco/src/qq/util/qemu-config.c:60:16 #3 0x55ed778aa307 in get_drive_infolist /home/elmarco/src/qq/util/qemu-config.c:140:19 #4 0x55ed778a9f40 in qmp_query_command_line_options /home/elmarco/src/qq/util/qemu-config.c:254:36 #5 0x55ed76d4868c in qmp_marshal_query_command_line_options /home/elmarco/src/qq/build/qmp-marshal.c:3078:14 #6 0x55ed77855dd5 in do_qmp_dispatch /home/elmarco/src/qq/qapi/qmp-dispatch.c:104:5 #7 0x55ed778558cc in qmp_dispatch /home/elmarco/src/qq/qapi/qmp-dispatch.c:131:11 #8 0x55ed768b592f in handle_qmp_command /home/elmarco/src/qq/monitor.c:3840:11 #9 0x55ed7786ccfe in json_message_process_token /home/elmarco/src/qq/qobject/json-streamer.c:105:5 #10 0x55ed778fe37c in json_lexer_feed_char /home/elmarco/src/qq/qobject/json-lexer.c:323:13 #11 0x55ed778fdde6 in json_lexer_feed /home/elmarco/src/qq/qobject/json-lexer.c:373:15 #12 0x55ed7786cd83 in json_message_parser_feed /home/elmarco/src/qq/qobject/json-streamer.c:124:12 #13 0x55ed768b559e in monitor_qmp_read /home/elmarco/src/qq/monitor.c:3882:5 #14 0x55ed77714f29 in qemu_chr_be_write_impl /home/elmarco/src/qq/chardev/char.c:167:9 #15 0x55ed77714fde in qemu_chr_be_write /home/elmarco/src/qq/chardev/char.c:179:9 #16 0x55ed7772ffad in tcp_chr_read /home/elmarco/src/qq/chardev/char-socket.c:440:13 #17 0x55ed7777113b in qio_channel_fd_source_dispatch /home/elmarco/src/qq/io/channel-watch.c:84:12 #18 0x7f3f5e71d90b in g_main_dispatch /home/elmarco/src/gnome/glib/builddir/../glib/gmain.c:3182 #19 0x7f3f5e71e7ac in g_main_context_dispatch /home/elmarco/src/gnome/glib/builddir/../glib/gmain.c:3847 #20 0x55ed77886ffc in glib_pollfds_poll /home/elmarco/src/qq/util/main-loop.c:214:9 #21 0x55ed778865fd in os_host_main_loop_wait /home/elmarco/src/qq/util/main-loop.c:261:5 #22 0x55ed77886222 in main_loop_wait /home/elmarco/src/qq/util/main-loop.c:515:11 #23 0x55ed76d2a4df in main_loop /home/elmarco/src/qq/vl.c:1995:9 #24 0x55ed76d1cb4a in main /home/elmarco/src/qq/vl.c:4914:5 #25 0x7f3f555f6039 in __libc_start_main (/lib64/libc.so.6+0x21039) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180104160523.22995-14-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Spotted thanks to ASAN: ==25226==ERROR: AddressSanitizer: global-buffer-overflow on address 0x556715a1f120 at pc 0x556714b6f6b1 bp 0x7ffcdfac1360 sp 0x7ffcdfac1350 READ of size 1 at 0x556715a1f120 thread T0 #0 0x556714b6f6b0 in init_disasm /home/elmarco/src/qemu/disas/s390.c:219 #1 0x556714b6fa6a in print_insn_s390 /home/elmarco/src/qemu/disas/s390.c:294 #2 0x55671484d031 in monitor_disas /home/elmarco/src/qemu/disas.c:635 #3 0x556714862ec0 in memory_dump /home/elmarco/src/qemu/monitor.c:1324 #4 0x55671486342a in hmp_memory_dump /home/elmarco/src/qemu/monitor.c:1418 #5 0x5567148670be in handle_hmp_command /home/elmarco/src/qemu/monitor.c:3109 #6 0x5567148674ed in qmp_human_monitor_command /home/elmarco/src/qemu/monitor.c:613 #7 0x556714b00918 in qmp_marshal_human_monitor_command /home/elmarco/src/qemu/build/qmp-marshal.c:1704 #8 0x556715138a3e in do_qmp_dispatch /home/elmarco/src/qemu/qapi/qmp-dispatch.c:104 #9 0x556715138f83 in qmp_dispatch /home/elmarco/src/qemu/qapi/qmp-dispatch.c:131 #10 0x55671485cf88 in handle_qmp_command /home/elmarco/src/qemu/monitor.c:3839 #11 0x55671514e80b in json_message_process_token /home/elmarco/src/qemu/qobject/json-streamer.c:105 #12 0x5567151bf2dc in json_lexer_feed_char /home/elmarco/src/qemu/qobject/json-lexer.c:323 #13 0x5567151bf827 in json_lexer_feed /home/elmarco/src/qemu/qobject/json-lexer.c:373 #14 0x55671514ee62 in json_message_parser_feed /home/elmarco/src/qemu/qobject/json-streamer.c:124 #15 0x556714854b1f in monitor_qmp_read /home/elmarco/src/qemu/monitor.c:3881 #16 0x556715045440 in qemu_chr_be_write_impl /home/elmarco/src/qemu/chardev/char.c:172 #17 0x556715047184 in qemu_chr_be_write /home/elmarco/src/qemu/chardev/char.c:184 #18 0x55671505a8e6 in tcp_chr_read /home/elmarco/src/qemu/chardev/char-socket.c:440 #19 0x5567150943c3 in qio_channel_fd_source_dispatch /home/elmarco/src/qemu/io/channel-watch.c:84 #20 0x7fb90292b90b in g_main_dispatch ../glib/gmain.c:3182 #21 0x7fb90292c7ac in g_main_context_dispatch ../glib/gmain.c:3847 #22 0x556715162eca in glib_pollfds_poll /home/elmarco/src/qemu/util/main-loop.c:214 #23 0x556715163001 in os_host_main_loop_wait /home/elmarco/src/qemu/util/main-loop.c:261 #24 0x5567151631fa in main_loop_wait /home/elmarco/src/qemu/util/main-loop.c:515 #25 0x556714ad6d3b in main_loop /home/elmarco/src/qemu/vl.c:1950 #26 0x556714ade329 in main /home/elmarco/src/qemu/vl.c:4865 #27 0x7fb8fe5c9009 in __libc_start_main (/lib64/libc.so.6+0x21009) #28 0x5567147af4d9 in _start (/home/elmarco/src/qemu/build/s390x-softmmu/qemu-system-s390x+0xf674d9) 0x556715a1f120 is located 32 bytes to the left of global variable 'char_hci_type_info' defined in '/home/elmarco/src/qemu/hw/bt/hci-csr.c:493:23' (0x556715a1f140) of size 104 0x556715a1f120 is located 8 bytes to the right of global variable 's390_opcodes' defined in '/home/elmarco/src/qemu/disas/s390.c:860:33' (0x556715a15280) of size 40600 This fix is based on Andreas Arnez <arnez@linux.vnet.ibm.com> upstream commit: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commitdiff;h=9ace48f3d7d80ce09c5df60cccb433470410b11b 2014-08-19 Andreas Arnez <arnez@linux.vnet.ibm.com> * s390-dis.c (init_disasm): Simplify initialization of opc_index[]. This also fixes an access after the last element of s390_opcodes[]. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180104160523.22995-19-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Spotted thanks to ASAN: QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 tests/migration-test -p /x86_64/migration/bad_dest ==30302==ERROR: LeakSanitizer: detected memory leaks Direct leak of 48 byte(s) in 1 object(s) allocated from: #0 0x7f60efba1a38 in __interceptor_calloc (/lib64/libasan.so.4+0xdea38) seanbruno#1 0x7f60eef3cf75 in g_malloc0 ../glib/gmem.c:124 seanbruno#2 0x55ca9094702c in error_copy /home/elmarco/src/qemu/util/error.c:203 seanbruno#3 0x55ca9037a30f in migrate_set_error /home/elmarco/src/qemu/migration/migration.c:1139 seanbruno#4 0x55ca9037a462 in migrate_fd_error /home/elmarco/src/qemu/migration/migration.c:1150 seanbruno#5 0x55ca9038162b in migrate_fd_connect /home/elmarco/src/qemu/migration/migration.c:2411 seanbruno#6 0x55ca90386e41 in migration_channel_connect /home/elmarco/src/qemu/migration/channel.c:81 seanbruno#7 0x55ca9038335e in socket_outgoing_migration /home/elmarco/src/qemu/migration/socket.c:85 seanbruno#8 0x55ca9083dd3a in qio_task_complete /home/elmarco/src/qemu/io/task.c:142 seanbruno#9 0x55ca9083d6cc in gio_task_thread_result /home/elmarco/src/qemu/io/task.c:88 seanbruno#10 0x7f60eef37317 in g_idle_dispatch ../glib/gmain.c:5552 seanbruno#11 0x7f60eef3490b in g_main_dispatch ../glib/gmain.c:3182 seanbruno#12 0x7f60eef357ac in g_main_context_dispatch ../glib/gmain.c:3847 seanbruno#13 0x55ca90927231 in glib_pollfds_poll /home/elmarco/src/qemu/util/main-loop.c:214 seanbruno#14 0x55ca90927420 in os_host_main_loop_wait /home/elmarco/src/qemu/util/main-loop.c:261 seanbruno#15 0x55ca909275fa in main_loop_wait /home/elmarco/src/qemu/util/main-loop.c:515 seanbruno#16 0x55ca8fc1c2a4 in main_loop /home/elmarco/src/qemu/vl.c:1942 seanbruno#17 0x55ca8fc2eb3a in main /home/elmarco/src/qemu/vl.c:4724 seanbruno#18 0x7f60e4082009 in __libc_start_main (/lib64/libc.so.6+0x21009) Indirect leak of 45 byte(s) in 1 object(s) allocated from: #0 0x7f60efba1850 in malloc (/lib64/libasan.so.4+0xde850) seanbruno#1 0x7f60eef3cf0c in g_malloc ../glib/gmem.c:94 seanbruno#2 0x7f60eef3d1cf in g_malloc_n ../glib/gmem.c:331 seanbruno#3 0x7f60eef596eb in g_strdup ../glib/gstrfuncs.c:363 seanbruno#4 0x55ca90947085 in error_copy /home/elmarco/src/qemu/util/error.c:204 seanbruno#5 0x55ca9037a30f in migrate_set_error /home/elmarco/src/qemu/migration/migration.c:1139 seanbruno#6 0x55ca9037a462 in migrate_fd_error /home/elmarco/src/qemu/migration/migration.c:1150 seanbruno#7 0x55ca9038162b in migrate_fd_connect /home/elmarco/src/qemu/migration/migration.c:2411 seanbruno#8 0x55ca90386e41 in migration_channel_connect /home/elmarco/src/qemu/migration/channel.c:81 seanbruno#9 0x55ca9038335e in socket_outgoing_migration /home/elmarco/src/qemu/migration/socket.c:85 seanbruno#10 0x55ca9083dd3a in qio_task_complete /home/elmarco/src/qemu/io/task.c:142 seanbruno#11 0x55ca9083d6cc in gio_task_thread_result /home/elmarco/src/qemu/io/task.c:88 seanbruno#12 0x7f60eef37317 in g_idle_dispatch ../glib/gmain.c:5552 seanbruno#13 0x7f60eef3490b in g_main_dispatch ../glib/gmain.c:3182 seanbruno#14 0x7f60eef357ac in g_main_context_dispatch ../glib/gmain.c:3847 seanbruno#15 0x55ca90927231 in glib_pollfds_poll /home/elmarco/src/qemu/util/main-loop.c:214 seanbruno#16 0x55ca90927420 in os_host_main_loop_wait /home/elmarco/src/qemu/util/main-loop.c:261 seanbruno#17 0x55ca909275fa in main_loop_wait /home/elmarco/src/qemu/util/main-loop.c:515 seanbruno#18 0x55ca8fc1c2a4 in main_loop /home/elmarco/src/qemu/vl.c:1942 seanbruno#19 0x55ca8fc2eb3a in main /home/elmarco/src/qemu/vl.c:4724 seanbruno#20 0x7f60e4082009 in __libc_start_main (/lib64/libc.so.6+0x21009) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180306170959.3921-1-marcandre.lureau@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

This fixes leaks found by ASAN such as: GTESTER tests/test-blockjob ================================================================= ==31442==ERROR: LeakSanitizer: detected memory leaks Direct leak of 24 byte(s) in 1 object(s) allocated from: #0 0x7f88483cba38 in __interceptor_calloc (/lib64/libasan.so.4+0xdea38) seanbruno#1 0x7f8845e1bd77 in g_malloc0 ../glib/gmem.c:129 seanbruno#2 0x7f8845e1c04b in g_malloc0_n ../glib/gmem.c:360 seanbruno#3 0x5584d2732498 in block_job_txn_new /home/elmarco/src/qemu/blockjob.c:172 seanbruno#4 0x5584d2739b28 in block_job_create /home/elmarco/src/qemu/blockjob.c:973 seanbruno#5 0x5584d270ae31 in mk_job /home/elmarco/src/qemu/tests/test-blockjob.c:34 seanbruno#6 0x5584d270b1c1 in do_test_id /home/elmarco/src/qemu/tests/test-blockjob.c:57 seanbruno#7 0x5584d270b65c in test_job_ids /home/elmarco/src/qemu/tests/test-blockjob.c:118 seanbruno#8 0x7f8845e40b69 in test_case_run ../glib/gtestutils.c:2255 seanbruno#9 0x7f8845e40f29 in g_test_run_suite_internal ../glib/gtestutils.c:2339 seanbruno#10 0x7f8845e40fd2 in g_test_run_suite_internal ../glib/gtestutils.c:2351 seanbruno#11 0x7f8845e411e9 in g_test_run_suite ../glib/gtestutils.c:2426 seanbruno#12 0x7f8845e3fe72 in g_test_run ../glib/gtestutils.c:1692 seanbruno#13 0x5584d270d6e2 in main /home/elmarco/src/qemu/tests/test-blockjob.c:377 seanbruno#14 0x7f8843641f29 in __libc_start_main (/lib64/libc.so.6+0x20f29) Add an assert to make sure that the job doesn't have associated txn before free(). [Jeff Cody: N.B., used updated patch provided by John Snow] Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>

Free the AIO context earlier than the GMainContext (if we have) to workaround a glib2 bug that GSource context pointer is not cleared even if the context has already been destroyed (while it should). The patch itself only changed the order to destroy the objects, no functional change at all. Without this workaround, we can encounter qmp-test hang with oob (and possibly any other use case when iothread is used with GMainContexts): #0 0x00007f35ffe45334 in __lll_lock_wait () from /lib64/libpthread.so.0 seanbruno#1 0x00007f35ffe405d8 in _L_lock_854 () from /lib64/libpthread.so.0 seanbruno#2 0x00007f35ffe404a7 in pthread_mutex_lock () from /lib64/libpthread.so.0 seanbruno#3 0x00007f35fc5b9c9d in g_source_unref_internal (source=0x24f0600, context=0x7f35f0000960, have_lock=0) at gmain.c:1685 seanbruno#4 0x0000000000aa6672 in aio_context_unref (ctx=0x24f0600) at /root/qemu/util/async.c:497 seanbruno#5 0x000000000065851c in iothread_instance_finalize (obj=0x24f0380) at /root/qemu/iothread.c:129 seanbruno#6 0x0000000000962d79 in object_deinit (obj=0x24f0380, type=0x242e960) at /root/qemu/qom/object.c:462 seanbruno#7 0x0000000000962e0d in object_finalize (data=0x24f0380) at /root/qemu/qom/object.c:476 seanbruno#8 0x0000000000964146 in object_unref (obj=0x24f0380) at /root/qemu/qom/object.c:924 seanbruno#9 0x0000000000965880 in object_finalize_child_property (obj=0x24ec640, name=0x24efca0 "mon_iothread", opaque=0x24f0380) at /root/qemu/qom/object.c:1436 seanbruno#10 0x0000000000962c33 in object_property_del_child (obj=0x24ec640, child=0x24f0380, errp=0x0) at /root/qemu/qom/object.c:436 seanbruno#11 0x0000000000962d26 in object_unparent (obj=0x24f0380) at /root/qemu/qom/object.c:455 seanbruno#12 0x0000000000658f00 in iothread_destroy (iothread=0x24f0380) at /root/qemu/iothread.c:365 seanbruno#13 0x00000000004c67a8 in monitor_cleanup () at /root/qemu/monitor.c:4663 seanbruno#14 0x0000000000669e27 in main (argc=16, argv=0x7ffc8b1ae2f8, envp=0x7ffc8b1ae380) at /root/qemu/vl.c:4749 The glib2 bug is fixed in commit 26056558b ("gmain: allow g_source_get_context() on destroyed sources", 2012-07-30), so the first good version is glib2 2.33.10. But we still support building with glib as old as 2.28, so we need the workaround. Let's make sure we destroy the GSources first before its owner context until we drop support for glib older than 2.33.10. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180409083956.1780-1-peterx@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>

Eric Auger reported the problem days ago that OOB broke ARM when running with libvirt: http://lists.gnu.org/archive/html/qemu-devel/2018-03/msg06231.html The problem was that the monitor dispatcher bottom half was bound to qemu_aio_context now, which could be polled unexpectedly in block code. We should keep the dispatchers run in iohandler_ctx just like what we did before the Out-Of-Band series (chardev uses qio, and qio binds everything with iohandler_ctx). If without this change, QMP dispatcher might be run even before reaching main loop in block IO path, for example, in a stack like (the ARM case, "cont" command handler run even during machine init phase): #0 qmp_cont () seanbruno#1 0x00000000006bd210 in qmp_marshal_cont () seanbruno#2 0x0000000000ac05c4 in do_qmp_dispatch () seanbruno#3 0x0000000000ac07a0 in qmp_dispatch () seanbruno#4 0x0000000000472d60 in monitor_qmp_dispatch_one () seanbruno#5 0x000000000047302c in monitor_qmp_bh_dispatcher () seanbruno#6 0x0000000000acf374 in aio_bh_call () seanbruno#7 0x0000000000acf428 in aio_bh_poll () seanbruno#8 0x0000000000ad5110 in aio_poll () seanbruno#9 0x0000000000a08ab8 in blk_prw () seanbruno#10 0x0000000000a091c4 in blk_pread () seanbruno#11 0x0000000000734f94 in pflash_cfi01_realize () seanbruno#12 0x000000000075a3a4 in device_set_realized () seanbruno#13 0x00000000009a26cc in property_set_bool () seanbruno#14 0x00000000009a0a40 in object_property_set () seanbruno#15 0x00000000009a3a08 in object_property_set_qobject () seanbruno#16 0x00000000009a0c8c in object_property_set_bool () seanbruno#17 0x0000000000758f94 in qdev_init_nofail () seanbruno#18 0x000000000058e190 in create_one_flash () seanbruno#19 0x000000000058e2f4 in create_flash () seanbruno#20 0x00000000005902f0 in machvirt_init () seanbruno#21 0x00000000007635cc in machine_run_board_init () seanbruno#22 0x00000000006b135c in main () Actually the problem is more severe than that. After we switched to the qemu AIO handler it means the monitor dispatcher code can even be called with nested aio_poll(), then it can be an explicit aio_poll() inside another main loop aio_poll() which could be racy too; breaking code like TPM and 9p that use nested event loops. Switch to use the iohandler_ctx for monitor dispatchers. My sincere thanks to Eric Auger who offered great help during both debugging and verifying the problem. The ARM test was carried out by applying this patch upon QEMU 2.12.0-rc0 and problem is gone after the patch. A quick test of mine shows that after this patch applied we can pass all raw iotests even with OOB on by default. CC: Eric Blake <eblake@redhat.com> CC: Markus Armbruster <armbru@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Fam Zheng <famz@redhat.com> Reported-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180410044942.17059-1-peterx@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>

…nnect When cancel migration during RDMA precopy, the source qemu main thread hangs sometime. The backtrace is: (gdb) bt #0 0x00007f249eabd43d in write () from /lib64/libpthread.so.0 #1 0x00007f24a1ce98e4 in rdma_get_cm_event (channel=0x4675d10, event=0x7ffe2f643dd0) at src/cma.c:2189 #2 0x00000000007b6166 in qemu_rdma_cleanup (rdma=0x6784000) at migration/rdma.c:2296 #3 0x00000000007b7cae in qio_channel_rdma_close (ioc=0x3bfcc30, errp=0x0) at migration/rdma.c:2999 #4 0x00000000008db60e in qio_channel_close (ioc=0x3bfcc30, errp=0x0) at io/channel.c:273 #5 0x00000000007a8765 in channel_close (opaque=0x3bfcc30) at migration/qemu-file-channel.c:98 #6 0x00000000007a71f9 in qemu_fclose (f=0x527c000) at migration/qemu-file.c:334 #7 0x0000000000795b96 in migrate_fd_cleanup (opaque=0x3b46280) at migration/migration.c:1162 #8 0x000000000093a71b in aio_bh_call (bh=0x3db7a20) at util/async.c:90 #9 0x000000000093a7b2 in aio_bh_poll (ctx=0x3b121c0) at util/async.c:118 #10 0x000000000093f2ad in aio_dispatch (ctx=0x3b121c0) at util/aio-posix.c:436 #11 0x000000000093ab41 in aio_ctx_dispatch (source=0x3b121c0, callback=0x0, user_data=0x0) at util/async.c:261 #12 0x00007f249f73c7aa in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #13 0x000000000093dc5e in glib_pollfds_poll () at util/main-loop.c:215 #14 0x000000000093dd4e in os_host_main_loop_wait (timeout=28000000) at util/main-loop.c:263 #15 0x000000000093de05 in main_loop_wait (nonblocking=0) at util/main-loop.c:522 #16 0x00000000005bc6a5 in main_loop () at vl.c:1944 #17 0x00000000005c39b5 in main (argc=56, argv=0x7ffe2f6443f8, envp=0x3ad0030) at vl.c:4752 It does not get the RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect sometime. According to IB Spec once active side send DREQ message, it should wait for DREP message and only once it arrived it should trigger a DISCONNECT event. DREP message can be dropped due to network issues. For that case the spec defines a DREP_timeout state in the CM state machine, if the DREP is dropped we should get a timeout and a TIMEWAIT_EXIT event will be trigger. Unfortunately the current kernel CM implementation doesn't include the DREP_timeout state and in above scenario we will not get DISCONNECT or TIMEWAIT_EXIT events. So it should not invoke rdma_get_cm_event which may hang forever, and the event channel is also destroyed in qemu_rdma_cleanup. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>

With a Spice port chardev, it is possible to reenter monitor_qapi_event_queue() (when the client disconnects for example). This will dead-lock on monitor_lock. Instead, use some TLS variables to check for recursion and queue the events. Fixes: (gdb) bt #0 0x00007fa69e7217fd in __lll_lock_wait () at /lib64/libpthread.so.0 #1 0x00007fa69e71acf4 in pthread_mutex_lock () at /lib64/libpthread.so.0 #2 0x0000563303567619 in qemu_mutex_lock_impl (mutex=0x563303d3e220 <monitor_lock>, file=0x5633036589a8 "/home/elmarco/src/qq/monitor.c", line=645) at /home/elmarco/src/qq/util/qemu-thread-posix.c:66 #3 0x0000563302fa6c25 in monitor_qapi_event_queue (event=QAPI_EVENT_SPICE_DISCONNECTED, qdict=0x56330602bde0, errp=0x7ffc6ab5e728) at /home/elmarco/src/qq/monitor.c:645 #4 0x0000563303549aca in qapi_event_send_spice_disconnected (server=0x563305afd630, client=0x563305745360, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-ui.c:149 #5 0x00005633033e600f in channel_event (event=3, info=0x5633061b0050) at /home/elmarco/src/qq/ui/spice-core.c:235 #6 0x00007fa69f6c86bb in reds_handle_channel_event (reds=<optimized out>, event=3, info=0x5633061b0050) at reds.c:316 #7 0x00007fa69f6b193b in main_dispatcher_self_handle_channel_event (info=0x5633061b0050, event=3, self=0x563304e088c0) at main-dispatcher.c:197 #8 0x00007fa69f6b193b in main_dispatcher_channel_event (self=0x563304e088c0, event=event@entry=3, info=0x5633061b0050) at main-dispatcher.c:197 #9 0x00007fa69f6d0833 in red_stream_push_channel_event (s=s@entry=0x563305ad8f50, event=event@entry=3) at red-stream.c:414 #10 0x00007fa69f6d086b in red_stream_free (s=0x563305ad8f50) at red-stream.c:388 #11 0x00007fa69f6b7ddc in red_channel_client_finalize (object=0x563304df2360) at red-channel-client.c:347 #12 0x00007fa6a56b7fb9 in g_object_unref () at /lib64/libgobject-2.0.so.0 #13 0x00007fa69f6ba212 in red_channel_client_push (rcc=0x563304df2360) at red-channel-client.c:1341 #14 0x00007fa69f68b259 in red_char_device_send_msg_to_client (client=<optimized out>, msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305 #15 0x00007fa69f68b259 in red_char_device_send_msg_to_clients (msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305 #16 0x00007fa69f68b259 in red_char_device_read_from_device (dev=0x563304e08bc0) at char-device.c:353 #17 0x000056330317d01d in spice_chr_write (chr=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/spice.c:199 #18 0x00005633034deee7 in qemu_chr_write_buffer (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, offset=0x7ffc6ab5ea70, write_all=false) at /home/elmarco/src/qq/chardev/char.c:112 #19 0x00005633034df054 in qemu_chr_write (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, write_all=false) at /home/elmarco/src/qq/chardev/char.c:147 #20 0x00005633034e1e13 in qemu_chr_fe_write (be=0x563304dbb800, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/char-fe.c:42 #21 0x0000563302fa6334 in monitor_flush_locked (mon=0x563304dbb800) at /home/elmarco/src/qq/monitor.c:425 #22 0x0000563302fa6520 in monitor_puts (mon=0x563304dbb800, str=0x563305de7e9e "") at /home/elmarco/src/qq/monitor.c:468 #23 0x0000563302fa680c in qmp_send_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:517 #24 0x0000563302fa6905 in qmp_queue_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:538 #25 0x0000563302fa6b5b in monitor_qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730) at /home/elmarco/src/qq/monitor.c:624 #26 0x0000563302fa6c4b in monitor_qapi_event_queue (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730, errp=0x7ffc6ab5ed00) at /home/elmarco/src/qq/monitor.c:649 #27 0x0000563303548cce in qapi_event_send_shutdown (guest=false, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-run-state.c:58 #28 0x000056330313bcd7 in main_loop_should_exit () at /home/elmarco/src/qq/vl.c:1822 #29 0x000056330313bde3 in main_loop () at /home/elmarco/src/qq/vl.c:1862 #30 0x0000563303143781 in main (argc=3, argv=0x7ffc6ab5f068, envp=0x7ffc6ab5f088) at /home/elmarco/src/qq/vl.c:4644 Note that error report is now moved to the first caller, which may receive an error for a recursed event. This is probably fine (95% of callers use &error_abort, the rest have NULL error and ignore it) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180731150144.14022-1-marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [*_no_recurse renamed to *_no_reenter, local variables reordered] Signed-off-by: Markus Armbruster <armbru@redhat.com>

Spotted by ASAN, during make check... Direct leak of 40 byte(s) in 1 object(s) allocated from: #0 0x7f8e27262c48 in malloc (/lib64/libasan.so.5+0xeec48) #1 0x7f8e26a5f3c5 in g_malloc (/lib64/libglib-2.0.so.0+0x523c5) #2 0x555ab67078a8 in qstring_from_str /home/elmarco/src/qq/qobject/qstring.c:67 #3 0x555ab67071e4 in qstring_new /home/elmarco/src/qq/qobject/qstring.c:24 #4 0x555ab6713fbf in qstring_from_escaped_str /home/elmarco/src/qq/qobject/json-parser.c:144 #5 0x555ab671738c in parse_literal /home/elmarco/src/qq/qobject/json-parser.c:506 #6 0x555ab67179c3 in parse_value /home/elmarco/src/qq/qobject/json-parser.c:569 #7 0x555ab6715123 in parse_pair /home/elmarco/src/qq/qobject/json-parser.c:306 #8 0x555ab6715483 in parse_object /home/elmarco/src/qq/qobject/json-parser.c:357 #9 0x555ab671798b in parse_value /home/elmarco/src/qq/qobject/json-parser.c:561 #10 0x555ab6717a6b in json_parser_parse_err /home/elmarco/src/qq/qobject/json-parser.c:592 #11 0x555ab4fd4dcf in handle_qmp_command /home/elmarco/src/qq/monitor.c:4257 #12 0x555ab6712c4d in json_message_process_token /home/elmarco/src/qq/qobject/json-streamer.c:105 #13 0x555ab67e01e2 in json_lexer_feed_char /home/elmarco/src/qq/qobject/json-lexer.c:323 #14 0x555ab67e0af6 in json_lexer_feed /home/elmarco/src/qq/qobject/json-lexer.c:373 #15 0x555ab6713010 in json_message_parser_feed /home/elmarco/src/qq/qobject/json-streamer.c:124 #16 0x555ab4fd58ec in monitor_qmp_read /home/elmarco/src/qq/monitor.c:4337 #17 0x555ab6559df2 in qemu_chr_be_write_impl /home/elmarco/src/qq/chardev/char.c:175 #18 0x555ab6559e95 in qemu_chr_be_write /home/elmarco/src/qq/chardev/char.c:187 #19 0x555ab6560127 in fd_chr_read /home/elmarco/src/qq/chardev/char-fd.c:66 #20 0x555ab65d9c73 in qio_channel_fd_source_dispatch /home/elmarco/src/qq/io/channel-watch.c:84 #21 0x7f8e26a598ac in g_main_context_dispatch (/lib64/libglib-2.0.so.0+0x4c8ac) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180809114417.28718-4-marcandre.lureau@redhat.com> [Screwed up in commit b273145] Cc: qemu-stable@nongnu.org Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>

if qio_channel_rdma_readv return QIO_CHANNEL_ERR_BLOCK, the destination qemu crash. The backtrace is: (gdb) bt #0 0x0000000000000000 in ?? () #1 0x00000000008db50e in qio_channel_set_aio_fd_handler (ioc=0x38111e0, ctx=0x3726080, io_read=0x8db841 <qio_channel_restart_read>, io_write=0x0, opaque=0x38111e0) at io/channel.c: #2 0x00000000008db952 in qio_channel_set_aio_fd_handlers (ioc=0x38111e0) at io/channel.c:438 #3 0x00000000008dbab4 in qio_channel_yield (ioc=0x38111e0, condition=G_IO_IN) at io/channel.c:47 #4 0x00000000007a870b in channel_get_buffer (opaque=0x38111e0, buf=0x440c038 "", pos=0, size=327 at migration/qemu-file-channel.c:83 #5 0x00000000007a70f6 in qemu_fill_buffer (f=0x440c000) at migration/qemu-file.c:299 #6 0x00000000007a79d0 in qemu_peek_byte (f=0x440c000, offset=0) at migration/qemu-file.c:562 #7 0x00000000007a7a22 in qemu_get_byte (f=0x440c000) at migration/qemu-file.c:575 #8 0x00000000007a7c78 in qemu_get_be32 (f=0x440c000) at migration/qemu-file.c:655 #9 0x00000000007a0508 in qemu_loadvm_state (f=0x440c000) at migration/savevm.c:2126 #10 0x0000000000794141 in process_incoming_migration_co (opaque=0x0) at migration/migration.c:366 #11 0x000000000095c598 in coroutine_trampoline (i0=84033984, i1=0) at util/coroutine-ucontext.c:1 #12 0x00007f9c0db56d40 in ?? () from /lib64/libc.so.6 #13 0x00007f96fe858760 in ?? () #14 0x0000000000000000 in ?? () RDMA QIOChannel not implement io_set_aio_fd_handler. so qio_channel_set_aio_fd_handler will access NULL pointer. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>

when qio_channel_read return QIO_CHANNEL_ERR_BLOCK, the source qemu crash. The backtrace is: (gdb) bt #0 0x00007fb20aba91d7 in raise () from /lib64/libc.so.6 #1 0x00007fb20abaa8c8 in abort () from /lib64/libc.so.6 #2 0x00007fb20aba2146 in __assert_fail_base () from /lib64/libc.so.6 #3 0x00007fb20aba21f2 in __assert_fail () from /lib64/libc.so.6 #4 0x00000000008dba2d in qio_channel_yield (ioc=0x22f9e20, condition=G_IO_IN) at io/channel.c:460 #5 0x00000000007a870b in channel_get_buffer (opaque=0x22f9e20, buf=0x3d54038 "", pos=0, size=32768) at migration/qemu-file-channel.c:83 #6 0x00000000007a70f6 in qemu_fill_buffer (f=0x3d54000) at migration/qemu-file.c:299 #7 0x00000000007a79d0 in qemu_peek_byte (f=0x3d54000, offset=0) at migration/qemu-file.c:562 #8 0x00000000007a7a22 in qemu_get_byte (f=0x3d54000) at migration/qemu-file.c:575 #9 0x00000000007a7c46 in qemu_get_be16 (f=0x3d54000) at migration/qemu-file.c:647 #10 0x0000000000796db7 in source_return_path_thread (opaque=0x2242280) at migration/migration.c:1794 #11 0x00000000009428fa in qemu_thread_start (args=0x3e58420) at util/qemu-thread-posix.c:504 #12 0x00007fb20af3ddc5 in start_thread () from /lib64/libpthread.so.0 #13 0x00007fb20ac6b74d in clone () from /lib64/libc.so.6 This patch fixed by invoke qio_channel_yield only when qemu_in_coroutine(). Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>

Because RDMA QIOChannel not implement shutdown function, If the to_dst_file was set error, the return path thread will wait forever. and the migration thread will wait return path thread exit. the backtrace of return path thread is: (gdb) bt #0 0x00007f372a76bb0f in ppoll () from /lib64/libc.so.6 #1 0x000000000071dc24 in qemu_poll_ns (fds=0x7ef7091d0580, nfds=2, timeout=100000000) at qemu-timer.c:325 #2 0x00000000006b2fba in qemu_rdma_wait_comp_channel (rdma=0xd424000) at migration/rdma.c:1501 #3 0x00000000006b3191 in qemu_rdma_block_for_wrid (rdma=0xd424000, wrid_requested=4000, byte_len=0x7ef7091d0640) at migration/rdma.c:1580 #4 0x00000000006b3638 in qemu_rdma_exchange_get_response (rdma=0xd424000, head=0x7ef7091d0720, expecting=3, idx=0) at migration/rdma.c:1726 #5 0x00000000006b3ad6 in qemu_rdma_exchange_recv (rdma=0xd424000, head=0x7ef7091d0720, expecting=3) at migration/rdma.c:1903 #6 0x00000000006b5d03 in qemu_rdma_get_buffer (opaque=0x6a57dc0, buf=0x5c80030 "", pos=8, size=32768) at migration/rdma.c:2714 #7 0x00000000006a9635 in qemu_fill_buffer (f=0x5c80000) at migration/qemu-file.c:232 #8 0x00000000006a9ecd in qemu_peek_byte (f=0x5c80000, offset=0) at migration/qemu-file.c:502 #9 0x00000000006a9f1f in qemu_get_byte (f=0x5c80000) at migration/qemu-file.c:515 #10 0x00000000006aa162 in qemu_get_be16 (f=0x5c80000) at migration/qemu-file.c:591 #11 0x00000000006a46d3 in source_return_path_thread ( opaque=0xd826a0 <current_migration.37100>) at migration/migration.c:1331 #12 0x00007f372aa49e25 in start_thread () from /lib64/libpthread.so.0 #13 0x00007f372a77635d in clone () from /lib64/libc.so.6 the backtrace of migration thread is: (gdb) bt #0 0x00007f372aa4af57 in pthread_join () from /lib64/libpthread.so.0 #1 0x00000000007d5711 in qemu_thread_join (thread=0xd826f8 <current_migration.37100+88>) at util/qemu-thread-posix.c:504 #2 0x00000000006a4bc5 in await_return_path_close_on_source ( ms=0xd826a0 <current_migration.37100>) at migration/migration.c:1460 #3 0x00000000006a53e4 in migration_completion (s=0xd826a0 <current_migration.37100>, current_active_state=4, old_vm_running=0x7ef7089cf976, start_time=0x7ef7089cf980) at migration/migration.c:1695 #4 0x00000000006a5c54 in migration_thread (opaque=0xd826a0 <current_migration.37100>) at migration/migration.c:1837 #5 0x00007f372aa49e25 in start_thread () from /lib64/libpthread.so.0 #6 0x00007f372a77635d in clone () from /lib64/libc.so.6 Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>

tests/cdrom-test -p /x86_64/cdrom/boot/megasas Produces the following ASAN leak. ==25700==ERROR: LeakSanitizer: detected memory leaks Direct leak of 16 byte(s) in 1 object(s) allocated from: #0 0x7f06f8faac48 in malloc (/lib64/libasan.so.5+0xeec48) #1 0x7f06f87a73c5 in g_malloc (/lib64/libglib-2.0.so.0+0x523c5) #2 0x55a729f17738 in pci_dma_sglist_init /home/elmarco/src/qq/include/hw/pci/pci.h:818 #3 0x55a729f2a706 in megasas_map_dcmd /home/elmarco/src/qq/hw/scsi/megasas.c:698 #4 0x55a729f39421 in megasas_handle_dcmd /home/elmarco/src/qq/hw/scsi/megasas.c:1574 #5 0x55a729f3f70d in megasas_handle_frame /home/elmarco/src/qq/hw/scsi/megasas.c:1955 #6 0x55a729f40939 in megasas_mmio_write /home/elmarco/src/qq/hw/scsi/megasas.c:2119 #7 0x55a729f41102 in megasas_port_write /home/elmarco/src/qq/hw/scsi/megasas.c:2170 #8 0x55a729220e60 in memory_region_write_accessor /home/elmarco/src/qq/memory.c:527 #9 0x55a7292212b3 in access_with_adjusted_size /home/elmarco/src/qq/memory.c:594 #10 0x55a72922cf70 in memory_region_dispatch_write /home/elmarco/src/qq/memory.c:1473 #11 0x55a7290f5907 in flatview_write_continue /home/elmarco/src/qq/exec.c:3255 #12 0x55a7290f5ceb in flatview_write /home/elmarco/src/qq/exec.c:3294 #13 0x55a7290f6457 in address_space_write /home/elmarco/src/qq/exec.c:3384 #14 0x55a7290f64a8 in address_space_rw /home/elmarco/src/qq/exec.c:3395 #15 0x55a72929ecb0 in kvm_handle_io /home/elmarco/src/qq/accel/kvm/kvm-all.c:1729 #16 0x55a7292a0db5 in kvm_cpu_exec /home/elmarco/src/qq/accel/kvm/kvm-all.c:1969 #17 0x55a7291c4212 in qemu_kvm_cpu_thread_fn /home/elmarco/src/qq/cpus.c:1215 #18 0x55a72a966a6c in qemu_thread_start /home/elmarco/src/qq/util/qemu-thread-posix.c:504 #19 0x7f06ed486593 in start_thread (/lib64/libpthread.so.0+0x7593) Move the qemu_sglist_destroy() from megasas_complete_command() to megasas_unmap_frame(), so map/unmap are balanced. Signed-off-by: Marc-AndrÃ© Lureau <marcandre.lureau@redhat.com> Message-Id: <20180814141247.32336-1-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>

Spotted by ASAN doing some manual testing: Direct leak of 48 byte(s) in 1 object(s) allocated from: #0 0x7f5fcdc75e50 in calloc (/lib64/libasan.so.5+0xeee50) #1 0x7f5fcd47241d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5241d) #2 0x55f989be92ce in timer_new /home/elmarco/src/qq/include/qemu/timer.h:561 #3 0x55f989be92ff in timer_new_ms /home/elmarco/src/qq/include/qemu/timer.h:630 #4 0x55f989c0219d in hmp_migrate /home/elmarco/src/qq/hmp.c:2038 #5 0x55f98955927b in handle_hmp_command /home/elmarco/src/qq/monitor.c:3498 #6 0x55f98955fb8c in monitor_command_cb /home/elmarco/src/qq/monitor.c:4371 #7 0x55f98ad40f11 in readline_handle_byte /home/elmarco/src/qq/util/readline.c:393 #8 0x55f98955fa4f in monitor_read /home/elmarco/src/qq/monitor.c:4354 #9 0x55f98aae30d7 in qemu_chr_be_write_impl /home/elmarco/src/qq/chardev/char.c:175 #10 0x55f98aae317a in qemu_chr_be_write /home/elmarco/src/qq/chardev/char.c:187 #11 0x55f98aae940c in fd_chr_read /home/elmarco/src/qq/chardev/char-fd.c:66 #12 0x55f98ab63018 in qio_channel_fd_source_dispatch /home/elmarco/src/qq/io/channel-watch.c:84 #13 0x7f5fcd46c8ac in g_main_dispatch gmain.c:3177 Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180901134652.25884-1-marcandre.lureau@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

In qemu_laio_process_completions_and_submit, the AioContext is acquired before the ioq_submit iteration and after qemu_laio_process_completions, but the latter is not thread safe either. This change avoids a number of random crashes when the Main Thread and an IO Thread collide processing completions for the same AioContext. This is an example of such crash: - The IO Thread is trying to acquire the AioContext at aio_co_enter, which evidences that it didn't lock it before: Thread 3 (Thread 0x7fdfd8bd8700 (LWP 36743)): #0 0x00007fdfe0dd542d in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007fdfe0dd0de6 in _L_lock_870 () at /lib64/libpthread.so.0 #2 0x00007fdfe0dd0cdf in __GI___pthread_mutex_lock (mutex=mutex@entry=0x5631fde0e6c0) at ../nptl/pthread_mutex_lock.c:114 #3 0x00005631fc0603a7 in qemu_mutex_lock_impl (mutex=0x5631fde0e6c0, file=0x5631fc23520f "util/async.c", line=511) at util/qemu-thread-posix.c:66 #4 0x00005631fc05b558 in aio_co_enter (ctx=0x5631fde0e660, co=0x7fdfcc0c2b40) at util/async.c:493 #5 0x00005631fc05b5ac in aio_co_wake (co=<optimized out>) at util/async.c:478 #6 0x00005631fbfc51ad in qemu_laio_process_completion (laiocb=<optimized out>) at block/linux-aio.c:104 #7 0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670) at block/linux-aio.c:222 #8 0x00005631fbfc5499 in qemu_laio_process_completions_and_submit (s=0x7fdfc0297670) at block/linux-aio.c:237 #9 0x00005631fc05d978 in aio_dispatch_handlers (ctx=ctx@entry=0x5631fde0e660) at util/aio-posix.c:406 #10 0x00005631fc05e3ea in aio_poll (ctx=0x5631fde0e660, blocking=blocking@entry=true) at util/aio-posix.c:693 #11 0x00005631fbd7ad96 in iothread_run (opaque=0x5631fde0e1c0) at iothread.c:64 #12 0x00007fdfe0dcee25 in start_thread (arg=0x7fdfd8bd8700) at pthread_create.c:308 #13 0x00007fdfe0afc34d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 - The Main Thread is also processing completions from the same AioContext, and crashes due to failed assertion at util/iov.c:78: Thread 1 (Thread 0x7fdfeb5eac80 (LWP 36740)): #0 0x00007fdfe0a391f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x00007fdfe0a3a8e8 in __GI_abort () at abort.c:90 #2 0x00007fdfe0a32266 in __assert_fail_base (fmt=0x7fdfe0b84e68 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x5631fc238ccb "offset == 0", file=file@entry=0x5631fc23698e "util/iov.c", line=line@entry=78, function=function@entry=0x5631fc236adc <__PRETTY_FUNCTION__.15220> "iov_memset") at assert.c:92 #3 0x00007fdfe0a32312 in __GI___assert_fail (assertion=assertion@entry=0x5631fc238ccb "offset == 0", file=file@entry=0x5631fc23698e "util/iov.c", line=line@entry=78, function=function@entry=0x5631fc236adc <__PRETTY_FUNCTION__.15220> "iov_memset") at assert.c:101 #4 0x00005631fc065287 in iov_memset (iov=<optimized out>, iov_cnt=<optimized out>, offset=<optimized out>, offset@entry=65536, fillc=fillc@entry=0, bytes=15515191315812405248) at util/iov.c:78 #5 0x00005631fc065a63 in qemu_iovec_memset (qiov=<optimized out>, offset=offset@entry=65536, fillc=fillc@entry=0, bytes=<optimized out>) at util/iov.c:410 #6 0x00005631fbfc5178 in qemu_laio_process_completion (laiocb=0x7fdd920df630) at block/linux-aio.c:88 #7 0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670) at block/linux-aio.c:222 #8 0x00005631fbfc5499 in qemu_laio_process_completions_and_submit (s=0x7fdfc0297670) at block/linux-aio.c:237 #9 0x00005631fbfc54ed in qemu_laio_poll_cb (opaque=<optimized out>) at block/linux-aio.c:272 #10 0x00005631fc05d85e in run_poll_handlers_once (ctx=ctx@entry=0x5631fde0e660) at util/aio-posix.c:497 #11 0x00005631fc05e2ca in aio_poll (blocking=false, ctx=0x5631fde0e660) at util/aio-posix.c:574 #12 0x00005631fc05e2ca in aio_poll (ctx=0x5631fde0e660, blocking=blocking@entry=false) at util/aio-posix.c:604 #13 0x00005631fbfcb8a3 in bdrv_do_drained_begin (ignore_parent=<optimized out>, recursive=<optimized out>, bs=<optimized out>) at block/io.c:273 #14 0x00005631fbfcb8a3 in bdrv_do_drained_begin (bs=0x5631fe8b6200, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at block/io.c:390 #15 0x00005631fbfbcd2e in blk_drain (blk=0x5631fe83ac80) at block/block-backend.c:1590 #16 0x00005631fbfbe138 in blk_remove_bs (blk=blk@entry=0x5631fe83ac80) at block/block-backend.c:774 #17 0x00005631fbfbe3d6 in blk_unref (blk=0x5631fe83ac80) at block/block-backend.c:401 #18 0x00005631fbfbe3d6 in blk_unref (blk=0x5631fe83ac80) at block/block-backend.c:449 #19 0x00005631fbfc9a69 in commit_complete (job=0x5631fe8b94b0, opaque=0x7fdfcc1bb080) at block/commit.c:92 #20 0x00005631fbf7d662 in job_defer_to_main_loop_bh (opaque=0x7fdfcc1b4560) at job.c:973 #21 0x00005631fc05ad41 in aio_bh_poll (bh=0x7fdfcc01ad90) at util/async.c:90 #22 0x00005631fc05ad41 in aio_bh_poll (ctx=ctx@entry=0x5631fddffdb0) at util/async.c:118 #23 0x00005631fc05e210 in aio_dispatch (ctx=0x5631fddffdb0) at util/aio-posix.c:436 #24 0x00005631fc05ac1e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:261 #25 0x00007fdfeaae44c9 in g_main_context_dispatch (context=0x5631fde00140) at gmain.c:3201 #26 0x00007fdfeaae44c9 in g_main_context_dispatch (context=context@entry=0x5631fde00140) at gmain.c:3854 #27 0x00005631fc05d503 in main_loop_wait () at util/main-loop.c:215 #28 0x00005631fc05d503 in main_loop_wait (timeout=<optimized out>) at util/main-loop.c:238 #29 0x00005631fc05d503 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:497 #30 0x00005631fbd81412 in main_loop () at vl.c:1866 #31 0x00005631fbc18ff3 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4647 - A closer examination shows that s->io_q.in_flight appears to have gone backwards: (gdb) frame 7 #7 0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670) at block/linux-aio.c:222 222 qemu_laio_process_completion(laiocb); (gdb) p s $2 = (LinuxAioState *) 0x7fdfc0297670 (gdb) p *s $3 = {aio_context = 0x5631fde0e660, ctx = 0x7fdfeb43b000, e = {rfd = 33, wfd = 33}, io_q = {plugged = 0, in_queue = 0, in_flight = 4294967280, blocked = false, pending = {sqh_first = 0x0, sqh_last = 0x7fdfc0297698}}, completion_bh = 0x7fdfc0280ef0, event_idx = 21, event_max = 241} (gdb) p/x s->io_q.in_flight $4 = 0xfffffff0 Signed-off-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Recently, the test case has started failing because some job related functions want to drop the AioContext lock even though it hasn't been taken: (gdb) bt #0 0x00007f51c067c9fb in raise () from /lib64/libc.so.6 #1 0x00007f51c067e77d in abort () from /lib64/libc.so.6 #2 0x0000558c9d5dde7b in error_exit (err=<optimized out>, msg=msg@entry=0x558c9d6fe120 <__func__.18373> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36 #3 0x0000558c9d6b5263 in qemu_mutex_unlock_impl (mutex=mutex@entry=0x558c9f3999a0, file=file@entry=0x558c9d6fd36f "util/async.c", line=line@entry=516) at util/qemu-thread-posix.c:96 #4 0x0000558c9d6b0565 in aio_context_release (ctx=ctx@entry=0x558c9f399940) at util/async.c:516 #5 0x0000558c9d5eb3da in job_completed_txn_abort (job=0x558c9f68e640) at job.c:738 #6 0x0000558c9d5eb227 in job_finish_sync (job=0x558c9f68e640, finish=finish@entry=0x558c9d5eb8d0 <job_cancel_err>, errp=errp@entry=0x0) at job.c:986 #7 0x0000558c9d5eb8ee in job_cancel_sync (job=<optimized out>) at job.c:941 #8 0x0000558c9d64d853 in replication_close (bs=<optimized out>) at block/replication.c:148 #9 0x0000558c9d5e5c9f in bdrv_close (bs=0x558c9f41b020) at block.c:3420 #10 bdrv_delete (bs=0x558c9f41b020) at block.c:3629 #11 bdrv_unref (bs=0x558c9f41b020) at block.c:4685 #12 0x0000558c9d62a3f3 in blk_remove_bs (blk=blk@entry=0x558c9f42a7c0) at block/block-backend.c:783 #13 0x0000558c9d62a667 in blk_delete (blk=0x558c9f42a7c0) at block/block-backend.c:402 #14 blk_unref (blk=0x558c9f42a7c0) at block/block-backend.c:457 #15 0x0000558c9d5dfcea in test_secondary_stop () at tests/test-replication.c:478 #16 0x00007f51c1f13178 in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0 #17 0x00007f51c1f1337b in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0 #18 0x00007f51c1f1337b in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0 #19 0x00007f51c1f13552 in g_test_run_suite () from /lib64/libglib-2.0.so.0 #20 0x00007f51c1f13571 in g_test_run () from /lib64/libglib-2.0.so.0 #21 0x0000558c9d5de31f in main (argc=<optimized out>, argv=<optimized out>) at tests/test-replication.c:581 It is yet unclear whether this should really be considered a bug in the test case or whether blk_unref() should work for callers that haven't taken the AioContext lock, but in order to fix the build tests quickly, just take the AioContext lock around blk_unref(). Signed-off-by: Kevin Wolf <kwolf@redhat.com>

When a monitor is connected to a Spice chardev, the monitor cleanup can dead-lock: #0 0x00007f43446637fd in __lll_lock_wait () at /lib64/libpthread.so.0 #1 0x00007f434465ccf4 in pthread_mutex_lock () at /lib64/libpthread.so.0 #2 0x0000556dd79f22ba in qemu_mutex_lock_impl (mutex=0x556dd81c9220 <monitor_lock>, file=0x556dd7ae3648 "/home/elmarco/src/qq/monitor.c", line=645) at /home/elmarco/src/qq/util/qemu-thread-posix.c:66 #3 0x0000556dd7431bd5 in monitor_qapi_event_queue (event=QAPI_EVENT_SPICE_DISCONNECTED, qdict=0x556dd9abc850, errp=0x7fffb7bbddd8) at /home/elmarco/src/qq/monitor.c:645 #4 0x0000556dd79d476b in qapi_event_send_spice_disconnected (server=0x556dd98ee760, client=0x556ddaaa8560, errp=0x556dd82180d0 <error_abort>) at qapi/qapi-events-ui.c:149 #5 0x0000556dd7870fc1 in channel_event (event=3, info=0x556ddad1b590) at /home/elmarco/src/qq/ui/spice-core.c:235 #6 0x00007f434560a6bb in reds_handle_channel_event (reds=<optimized out>, event=3, info=0x556ddad1b590) at reds.c:316 #7 0x00007f43455f393b in main_dispatcher_self_handle_channel_event (info=0x556ddad1b590, event=3, self=0x556dd9a7d8c0) at main-dispatcher.c:197 #8 0x00007f43455f393b in main_dispatcher_channel_event (self=0x556dd9a7d8c0, event=event@entry=3, info=0x556ddad1b590) at main-dispatcher.c:197 #9 0x00007f4345612833 in red_stream_push_channel_event (s=s@entry=0x556ddae2ef40, event=event@entry=3) at red-stream.c:414 #10 0x00007f434561286b in red_stream_free (s=0x556ddae2ef40) at red-stream.c:388 #11 0x00007f43455f9ddc in red_channel_client_finalize (object=0x556dd9bb21a0) at red-channel-client.c:347 #12 0x00007f434b5f9fb9 in g_object_unref () at /lib64/libgobject-2.0.so.0 #13 0x00007f43455fc212 in red_channel_client_push (rcc=0x556dd9bb21a0) at red-channel-client.c:1341 #14 0x0000556dd76081ba in spice_port_set_fe_open (chr=0x556dd9925e20, fe_open=0) at /home/elmarco/src/qq/chardev/spice.c:241 #15 0x0000556dd796d74a in qemu_chr_fe_set_open (be=0x556dd9a37c00, fe_open=0) at /home/elmarco/src/qq/chardev/char-fe.c:340 #16 0x0000556dd796d4d9 in qemu_chr_fe_set_handlers (b=0x556dd9a37c00, fd_can_read=0x0, fd_read=0x0, fd_event=0x0, be_change=0x0, opaque=0x0, context=0x0, set_open=true) at /home/elmarco/src/qq/chardev/char-fe.c:280 #17 0x0000556dd796d359 in qemu_chr_fe_deinit (b=0x556dd9a37c00, del=false) at /home/elmarco/src/qq/chardev/char-fe.c:233 #18 0x0000556dd7432240 in monitor_data_destroy (mon=0x556dd9a37c00) at /home/elmarco/src/qq/monitor.c:786 #19 0x0000556dd743b968 in monitor_cleanup () at /home/elmarco/src/qq/monitor.c:4683 #20 0x0000556dd75ce776 in main (argc=3, argv=0x7fffb7bbe458, envp=0x7fffb7bbe478) at /home/elmarco/src/qq/vl.c:4660 Because spice code tries to emit a "disconnected" signal on the monitors. Fix this dead-lock by releasing the monitor lock for flush/destroy. monitor_lock protects mon_list, monitor_qapi_event_state and monitor_destroyed. monitor_flush() and monitor_data_destroy() don't access any of those variables. monitor_cleanup()'s loop is safe because it uses QTAILQ_FOREACH_SAFE(), and no further monitor can be added after calling monitor_cleanup() thanks to monitor_destroyed check in monitor_list_append(). Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20181205203737.9011-8-marcandre.lureau@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Both virtio-blk and virtio-scsi use virtio_queue_empty() as the loop condition in VQ handlers (virtio_blk_handle_vq, virtio_scsi_handle_cmd_vq). When a device is marked broken in virtqueue_pop, for example if a vIOMMU address translation failed, we want to break out of the loop. This fixes a hanging problem when booting a CentOS 3.10.0-862.el7.x86_64 kernel with ATS enabled: $ qemu-system-x86_64 \ ... \ -device intel-iommu,intremap=on,caching-mode=on,eim=on,device-iotlb=on \ -device virtio-scsi-pci,iommu_platform=on,ats=on,id=scsi0,bus=pci.4,addr=0x0 The dead loop happens immediately when the kernel boots and initializes the device, where virtio_scsi_data_plane_handle_cmd will not return: > ... > #13 0x00005586602b7793 in virtio_scsi_handle_cmd_vq > #14 0x00005586602b8d66 in virtio_scsi_data_plane_handle_cmd > #15 0x00005586602ddab7 in virtio_queue_notify_aio_vq > #16 0x00005586602dfc9f in virtio_queue_host_notifier_aio_poll > #17 0x00005586607885da in run_poll_handlers_once > #18 0x000055866078880e in try_poll_mode > #19 0x00005586607888eb in aio_poll > #20 0x0000558660784561 in aio_wait_bh_oneshot > #21 0x00005586602b9582 in virtio_scsi_dataplane_stop > #22 0x00005586605a7110 in virtio_bus_stop_ioeventfd > #23 0x00005586605a9426 in virtio_pci_stop_ioeventfd > #24 0x00005586605ab808 in virtio_pci_common_write > #25 0x0000558660242396 in memory_region_write_accessor > #26 0x00005586602425ab in access_with_adjusted_size > #27 0x0000558660245281 in memory_region_dispatch_write > #28 0x00005586601e008e in flatview_write_continue > #29 0x00005586601e01d8 in flatview_write > #30 0x00005586601e04de in address_space_write > #31 0x00005586601e052f in address_space_rw > #32 0x00005586602607f2 in kvm_cpu_exec > #33 0x0000558660227148 in qemu_kvm_cpu_thread_fn > #34 0x000055866078bde7 in qemu_thread_start > #35 0x00007f5784906594 in start_thread > #36 0x00007f5784639e6f in clone With this patch, virtio_queue_empty will now return 1 as soon as the vdev is marked as broken, after a "virtio: zero sized buffers are not allowed" error. To be consistent, update virtio_queue_empty_rcu as well. Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20180910145616.8598-2-famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Spotted by ASAN: ================================================================= ==11893==ERROR: LeakSanitizer: detected memory leaks Direct leak of 1120 byte(s) in 28 object(s) allocated from: #0 0x7fd0515b0c48 in malloc (/lib64/libasan.so.5+0xeec48) #1 0x7fd050ffa3c5 in g_malloc (/lib64/libglib-2.0.so.0+0x523c5) #2 0x559e708b56a4 in qstring_from_str /home/elmarco/src/qq/qobject/qstring.c:66 #3 0x559e708b4fe0 in qstring_new /home/elmarco/src/qq/qobject/qstring.c:23 #4 0x559e708bda7d in parse_string /home/elmarco/src/qq/qobject/json-parser.c:143 #5 0x559e708c1009 in parse_literal /home/elmarco/src/qq/qobject/json-parser.c:484 #6 0x559e708c1627 in parse_value /home/elmarco/src/qq/qobject/json-parser.c:547 #7 0x559e708c1c67 in json_parser_parse /home/elmarco/src/qq/qobject/json-parser.c:573 #8 0x559e708bc0ff in json_message_process_token /home/elmarco/src/qq/qobject/json-streamer.c:92 #9 0x559e708d1655 in json_lexer_feed_char /home/elmarco/src/qq/qobject/json-lexer.c:292 #10 0x559e708d1fe1 in json_lexer_feed /home/elmarco/src/qq/qobject/json-lexer.c:339 #11 0x559e708bc856 in json_message_parser_feed /home/elmarco/src/qq/qobject/json-streamer.c:121 #12 0x559e708b8b4b in qobject_from_jsonv /home/elmarco/src/qq/qobject/qjson.c:69 #13 0x559e708b8d02 in qobject_from_json /home/elmarco/src/qq/qobject/qjson.c:83 #14 0x559e708a74ae in from_json_str /home/elmarco/src/qq/tests/check-qjson.c:30 #15 0x559e708a9f83 in utf8_string /home/elmarco/src/qq/tests/check-qjson.c:781 #16 0x7fd05101bc49 in test_case_run gtestutils.c:2255 #17 0x7fd05101bc49 in g_test_run_suite_internal gtestutils.c:2339 Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180901211917.10372-1-marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Due to the way we use exceptions to emulate TLS and RAS for armv4/v5,…

99282c4

… we gotta make cpu_signal_handler() exclusive. The trick is, cpu_signal_handler() never returns, it just siglongjmps() to the main loop, so try to work around this.u exclusive.

seanbruno added a commit that referenced this pull request Oct 30, 2015

Merge pull request #13 from cognet/bsd-user

e07f891

Due to the way we use exceptions to emulate TLS and RAS for armv4/v5,…

seanbruno merged commit e07f891 into seanbruno:bsd-user Oct 30, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Due to the way we use exceptions to emulate TLS and RAS for armv4/v5,… #13

Due to the way we use exceptions to emulate TLS and RAS for armv4/v5,… #13

cognet commented Oct 30, 2015

Due to the way we use exceptions to emulate TLS and RAS for armv4/v5,… #13

Due to the way we use exceptions to emulate TLS and RAS for armv4/v5,… #13

Conversation

cognet commented Oct 30, 2015