Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread

qemu-block

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread

From:	Hanna Czenczek
Subject:	Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread
Date:	Thu, 25 Jan 2024 18:32:46 +0100
User-agent:	Mozilla Thunderbird

On 23.01.24 18:10, Kevin Wolf wrote:

Am 23.01.2024 um 17:40 hat Hanna Czenczek geschrieben:

On 21.12.23 22:23, Kevin Wolf wrote:

From: Stefan Hajnoczi<stefanha@redhat.com>

Stop depending on the AioContext lock and instead access
SCSIDevice->requests from only one thread at a time:
- When the VM is running only the BlockBackend's AioContext may access
    the requests list.
- When the VM is stopped only the main loop may access the requests
    list.

These constraints protect the requests list without the need for locking
in the I/O code path.

Note that multiple IOThreads are not supported yet because the code
assumes all SCSIRequests are executed from a single AioContext. Leave
that as future work.

Signed-off-by: Stefan Hajnoczi<stefanha@redhat.com>
Reviewed-by: Eric Blake<eblake@redhat.com>
Message-ID:<20231204164259.1515217-2-stefanha@redhat.com>
Signed-off-by: Kevin Wolf<kwolf@redhat.com>
---
   include/hw/scsi/scsi.h |   7 +-
   hw/scsi/scsi-bus.c     | 181 ++++++++++++++++++++++++++++-------------
   2 files changed, 131 insertions(+), 57 deletions(-)

My reproducer for https://issues.redhat.com/browse/RHEL-3934 now breaks more
often because of this commit than because of the original bug, i.e. when
repeatedly hot-plugging and unplugging a virtio-scsi and a scsi-hd device,
this tends to happen when unplugging the scsi-hd:

{"execute":"device_del","arguments":{"id":"stg0"}}
{"return": {}}
qemu-system-x86_64: ../block/block-backend.c:2429: blk_get_aio_context:
Assertion `ctx == blk->ctx' failed.


[...]

I don’t know anything about the problem yet, but as usual, I like
speculation and discovering how wrong I was later on, so one thing I came
across that’s funny about virtio-scsi is that requests can happen even while
a disk is being attached or detached.  That is, Linux seems to probe all
LUNs when a new virtio-scsi device is being attached, and it won’t stop just
because a disk is being attached or removed.  So maybe that’s part of the
problem, that we get a request while the BB is being detached, and
temporarily in an inconsistent state (BDS context differs from BB context).

I don't know anything about the problem either, but since you already
speculated about the cause, let me speculate about the solution:
Can we hold the graph writer lock for the tran_commit() call in
bdrv_try_change_aio_context()? And of course take the reader lock for
blk_get_aio_context(), but that should be completely unproblematic.

Actually, now that completely unproblematic part is giving me trouble. I wanted to just put a graph lock into blk_get_aio_context() (making ita coroutine with a wrapper), but callers of blk_get_aio_context()generally assume the context is going to stay the BB’s context for aslong as their AioContext * variable is in scope. I was tempted to thinkcallers know what happens to the BB they pass to blk_get_aio_context(),and it won’t change contexts so easily, but then I remembered this isexactly what happens in this case; we runscsi_device_for_each_req_async_bh() in one thread (which callsblk_get_aio_context()), and in the other, we change the BB’s context.

It seems like there are very few blk_* functions right now that requiretaking a graph lock around it, so I’m hesitant to go that route. But ifwe’re protecting a BB’s context via the graph write lock, I can’t thinkof a way around having to take a read lock whenever reading a BB’scontext, and holding it for as long as we assume that context to remainthe BB’s context. It’s also hard to figure out how long that is, caseby case; for example, dma_blk_read() schedules an AIO function in the BBcontext; but we probably don’t care that this context remains the BB’scontext until the request is done. In the case ofscsi_device_for_each_req_async_bh(), we already take care to re-scheduleit when it turns out the context is outdated, so it does seem quiteimportant here, and we probably want to keep the lock until after theQTAILQ_FOREACH_SAFE() loop.

On a tangent, this TOCTTOU problem makes me wary of other blk_*functions that query information. For example, fuse_read() (inblock/export/fuse.c) truncates requests to the BB length. But what ifthe BB length changes concurrently between blk_getlength() andblk_pread()? While we can justify using the graph lock for a BB’sAioContext, we can’t use it for other metadata like its length.


Hanna

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread, Hanna Czenczek, 2024/01/23
- Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread, Kevin Wolf, 2024/01/23
  - Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread, Hanna Czenczek, 2024/01/23
  - Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread, Hanna Czenczek, 2024/01/24
    - Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread, Stefan Hajnoczi, 2024/01/24
    - Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread, Hanna Czenczek, 2024/01/25
    - Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread, Stefan Hajnoczi, 2024/01/25
  - Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread, Hanna Czenczek <=
    - Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread, Kevin Wolf, 2024/01/26
    - Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread, Hanna Czenczek, 2024/01/26
    - Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread, Stefan Hajnoczi, 2024/01/31
  - Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread, Hanna Czenczek, 2024/01/29
    - Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread, Kevin Wolf, 2024/01/31
- Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread, Hanna Czenczek, 2024/01/23

Prev by Date: Re: [PATCH v2 3/4] qapi: blockdev-backup: add discard-source parameter
Next by Date: Re: [PATCH 2/2] virtio: Keep notifications disabled during drain
Previous by thread: Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread
Next by thread: Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread
Index(es):
- Date
- Thread