qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

virtio-scsi and another complex AioContext issue


From: Sergio Lopez
Subject: virtio-scsi and another complex AioContext issue
Date: Thu, 11 Jun 2020 10:36:22 +0200

Hi,

While debugging BZ#1844343, I managed to reproduce the issue which
leads to crash with a backtrace like this one:

<---- snip ---->
Thread 2 (Thread 0x7fe208463f00 (LWP 1659571)):
#0  0x00007fe2033b78ed in __lll_lock_wait () at /lib64/libpthread.so.0
#1  0x00007fe2033b0bd4 in pthread_mutex_lock () at /lib64/libpthread.so.0
#2  0x0000560caa8f1e6d in qemu_mutex_lock_impl
    (mutex=0x560cacc68a10, file=0x560caaa9797f "util/async.c", line=521) at 
util/qemu-thread-posix.c:78
#3  0x0000560caa82414d in bdrv_set_aio_context_ignore
    (bs=bs@entry=0x560cacc73570, new_context=new_context@entry=0x560cacc5fed0, 
ignore=ignore@entry=0x7ffe388b1cc0) at block.c:6192
#4  0x0000560caa824503 in bdrv_child_try_set_aio_context
    (bs=bs@entry=0x560cacc73570, ctx=0x560cacc5fed0, ignore_child=<optimized 
out>, errp=<optimized out>)
    at block.c:6272
#5  0x0000560caa859e6b in blk_do_set_aio_context
    (blk=0x560cacecf370, new_context=0x560cacc5fed0, 
update_root_node=update_root_node@entry=true, errp=errp@entry=0x0) at 
block/block-backend.c:1989
#6  0x0000560caa85c501 in blk_set_aio_context
    (blk=<optimized out>, new_context=<optimized out>, errp=errp@entry=0x0) at 
block/block-backend.c:2010
#7  0x0000560caa61db30 in virtio_scsi_hotunplug
    (hotplug_dev=0x560cadaafbd0, dev=0x560cacec1210, errp=0x7ffe388b1d80)
    at 
/usr/src/debug/qemu-kvm-4.2.0-22.module+el8.2.1+6758+cb8d64c2.x86_64/hw/scsi/virtio-scsi.c:869
#8  0x0000560caa6ccd1e in qdev_unplug (dev=0x560cacec1210, 
errp=errp@entry=0x7ffe388b1db8)
    at qdev-monitor.c:872
#9  0x0000560caa6ccd9e in qmp_device_del (id=<optimized out>, 
errp=errp@entry=0x7ffe388b1db8)
    at qdev-monitor.c:884
#10 0x0000560caa7ec4d3 in qmp_marshal_device_del
    (args=<optimized out>, ret=<optimized out>, errp=0x7ffe388b1e18) at 
qapi/qapi-commands-qdev.c:99
#11 0x0000560caa8a45ec in do_qmp_dispatch
    (errp=0x7ffe388b1e10, allow_oob=<optimized out>, request=<optimized out>, 
cmds=0x560cab1928a0 <qmp_commands>) at qapi/qmp-dispatch.c:132
#12 0x0000560caa8a45ec in qmp_dispatch
    (cmds=0x560cab1928a0 <qmp_commands>, request=<optimized out>, 
allow_oob=<optimized out>)
    at qapi/qmp-dispatch.c:175
#13 0x0000560caa7c2521 in monitor_qmp_dispatch (mon=0x560cacca2f00, 
req=<optimized out>)
    at monitor/qmp.c:145
#14 0x0000560caa7c2bba in monitor_qmp_bh_dispatcher (data=<optimized out>) at 
monitor/qmp.c:234
#15 0x0000560caa8ec716 in aio_bh_call (bh=0x560cacbd80e0) at util/async.c:117
#16 0x0000560caa8ec716 in aio_bh_poll (ctx=ctx@entry=0x560cacbd6da0) at 
util/async.c:117
#17 0x0000560caa8efb04 in aio_dispatch (ctx=0x560cacbd6da0) at 
util/aio-posix.c:459
#18 0x0000560caa8ec5f2 in aio_ctx_dispatch
    (source=<optimized out>, callback=<optimized out>, user_data=<optimized 
out>) at util/async.c:260
#19 0x00007fe2078d167d in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#20 0x0000560caa8eebb8 in glib_pollfds_poll () at util/main-loop.c:219
#21 0x0000560caa8eebb8 in os_host_main_loop_wait (timeout=<optimized out>) at 
util/main-loop.c:242
#22 0x0000560caa8eebb8 in main_loop_wait (nonblocking=<optimized out>) at 
util/main-loop.c:518
#23 0x0000560caa6cfe51 in main_loop () at vl.c:1828
#24 0x0000560caa57b322 in main (argc=<optimized out>, argv=<optimized out>, 
envp=<optimized out>)
    at vl.c:4504

Thread 1 (Thread 0x7fe1fb059700 (LWP 1659573)):
#0  0x00007fe20301b70f in raise () at /lib64/libc.so.6
#1  0x00007fe203005b25 in abort () at /lib64/libc.so.6
#2  0x00007fe2030059f9 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
#3  0x00007fe203013cc6 in .annobin_assert.c_end () at /lib64/libc.so.6
#4  0x0000560caa85bfe4 in blk_get_aio_context (blk=0x560cacecf370) at 
block/block-backend.c:1968
#5  0x0000560caa85bfe4 in blk_get_aio_context (blk=0x560cacecf370) at 
block/block-backend.c:1962
#6  0x0000560caa61d79c in virtio_scsi_ctx_check (s=0x560cadaafbd0, 
s=0x560cadaafbd0, d=0x560cacec1210)
    at 
/usr/src/debug/qemu-kvm-4.2.0-22.module+el8.2.1+6758+cb8d64c2.x86_64/hw/scsi/virtio-scsi.c:250
#7  0x0000560caa61d79c in virtio_scsi_handle_cmd_req_prepare 
(req=0x7fe1ec013880, s=0x560cadaafbd0)
    at 
/usr/src/debug/qemu-kvm-4.2.0-22.module+el8.2.1+6758+cb8d64c2.x86_64/hw/scsi/virtio-scsi.c:569
#8  0x0000560caa61d79c in virtio_scsi_handle_cmd_vq
    (s=s@entry=0x560cadaafbd0, vq=vq@entry=0x7fe1f82ac140)
    at 
/usr/src/debug/qemu-kvm-4.2.0-22.module+el8.2.1+6758+cb8d64c2.x86_64/hw/scsi/virtio-scsi.c:612
#9  0x0000560caa61e48e in virtio_scsi_data_plane_handle_cmd (vdev=<optimized 
out>, vq=0x7fe1f82ac140)
    at 
/usr/src/debug/qemu-kvm-4.2.0-22.module+el8.2.1+6758+cb8d64c2.x86_64/hw/scsi/virtio-scsi-dataplane.c:60
#10 0x0000560caa62bfbe in virtio_queue_notify_aio_vq (vq=<optimized out>)
    at 
/usr/src/debug/qemu-kvm-4.2.0-22.module+el8.2.1+6758+cb8d64c2.x86_64/hw/virtio/virtio.c:2243
#11 0x0000560caa8ef046 in run_poll_handlers_once
    (ctx=ctx@entry=0x560cacc689b0, timeout=timeout@entry=0x7fe1fb058658) at 
util/aio-posix.c:517
#12 0x0000560caa8efbc5 in try_poll_mode (timeout=0x7fe1fb058658, 
ctx=0x560cacc689b0)
    at util/aio-posix.c:607
#13 0x0000560caa8efbc5 in aio_poll (ctx=0x560cacc689b0, 
blocking=blocking@entry=true)
    at util/aio-posix.c:639
#14 0x0000560caa6ca7f4 in iothread_run (opaque=0x560cacc1f000) at iothread.c:75
#15 0x0000560caa8f1d84 in qemu_thread_start (args=0x560cacc666f0) at 
util/qemu-thread-posix.c:519
#16 0x00007fe2033ae2de in start_thread () at /lib64/libpthread.so.0
#17 0x00007fe2030dfe83 in clone () at /lib64/libc.so.6
<---- snip ---->

Both the code path initiated by virtio_scsi_data_plane_cmd_vq() and
the one coming down from virtio_scsi_hotunplug() should be protected
by virtio_scsi_acquire(), but this can still happen because the latter
works by acquiring the AioContext pointed by s->ctx, and we have this
in bdrv_set_aio_context_ignore():

   6140 void bdrv_set_aio_context_ignore(BlockDriverState *bs,
   6141                                  AioContext *new_context, GSList 
**ignore)
   6142 {
(...)
   6179     /*
   6180      * If this function was recursively called from
   6181      * bdrv_set_aio_context_ignore(), there may be nodes in the
   6182      * subtree that have not yet been moved to the new AioContext.
   6183      * Release the old one so bdrv_drained_end() can poll them.
   6184      */
   6185     if (qemu_get_aio_context() != old_context) {
   6186         aio_context_release(old_context);
   6187     }

My first thought here is that given the context and the apparent
expectations of where and how virtio_scsi_acquire() is used, we should
probably use an independent lock on VirtIOSCSI instead of acquiring
the AioContext.

What do you think?

Thanks,
Sergio.

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]