qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 3/3] virtio-scsi: reset SCSI devices from main loop thread


From: Kevin Wolf
Subject: Re: [PATCH v2 3/3] virtio-scsi: reset SCSI devices from main loop thread
Date: Fri, 17 Feb 2023 11:22:02 +0100

Am 10.02.2023 um 15:32 hat Stefan Hajnoczi geschrieben:
> When an IOThread is configured, the ctrl virtqueue is processed in the
> IOThread. TMFs that reset SCSI devices are currently called directly
> from the IOThread and trigger an assertion failure in blk_drain():
> 
>   ../block/block-backend.c:1780: void blk_drain(BlockBackend *): Assertion 
> `qemu_in_main_thread()' failed.
> 
> The blk_drain() function is not designed to be called from an IOThread
> because it needs the Big QEMU Lock (BQL).
> 
> This patch defers TMFs that reset SCSI devices to a Bottom Half (BH)
> that runs in the main loop thread under the BQL. This way it's safe to
> call blk_drain() and the assertion failure is avoided.

It's not entirely obvious what is the call path that leads to
blk_drain(). Do we somehow call into virtio_scsi_dataplane_stop()?

> Introduce s->tmf_bh_list for tracking TMF requests that have been
> deferred to the BH. When the BH runs it will grab the entire list and
> process all requests. Care must be taken to clear the list when the
> virtio-scsi device is reset or unrealized. Otherwise deferred TMF
> requests could execute later and lead to use-after-free or other
> undefined behavior.

Why don't we already need the same for other asynchronously processed
requests? Certainly having a read request write to guest memory after
the device has been reset or unplugged isn't what we want either.

I see that we assert(!s->dataplane_started) in virtio_scsi_reset(),
which may be part of the reason. If we're not processing any requests,
then we're safe. virtio_scsi_dataplane_stop() calls blk_drain_all()
(which is really a too big hammer) in order to make sure that in-flight
requests are completed before dataplane_started becomes false.

I was wondering if we couldn't just blk_inc_in_flight() while a TMF
request is in flight and then use the same draining logic to be covered.
You could use oneshot BHs then and do away with the list because you
don't need to cancel anything any more, you just wait until the BHs have
completed.

The practical problem may be that we don't have a blk here (which is
probably also why blk_drain_all() is used). We could have our own
AIO_WAIT_WHILE() instead. I feel waiting instead of cancelling BHs would
simplify the code.

In fact, I think technically, you may not need any of that because
blk_drain_all() also executes all BHs in the main loop before it
returns, but that might be a bit subtle...

> The s->resetting counter that's used by TMFs that reset SCSI devices is
> accessed from multiple threads. This patch makes that explicit by using
> atomic accessor functions. With this patch applied the counter is only
> modified by the main loop thread under the BQL but can be read by any
> thread.
> 
> Reported-by: Qing Wang <qinwang@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Kevin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]