qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 2/3] dma-helpers: prevent dma_blk_cb() vs dma_aio_cancel()


From: Kevin Wolf
Subject: Re: [PATCH v2 2/3] dma-helpers: prevent dma_blk_cb() vs dma_aio_cancel() race
Date: Thu, 16 Feb 2023 16:27:42 +0100

Am 10.02.2023 um 15:32 hat Stefan Hajnoczi geschrieben:
> dma_blk_cb() only takes the AioContext lock around ->io_func(). That
> means the rest of dma_blk_cb() is not protected. In particular, the
> DMAAIOCB field accesses happen outside the lock.
> 
> There is a race when the main loop thread holds the AioContext lock and
> invokes scsi_device_purge_requests() -> bdrv_aio_cancel() ->
> dma_aio_cancel() while an IOThread executes dma_blk_cb(). The dbs->acb
> field determines how cancellation proceeds. If dma_aio_cancel() see
> dbs->acb == NULL while dma_blk_cb() is still running, the request can be
> completed twice (-ECANCELED and the actual return value).
> 
> The following assertion can occur with virtio-scsi when an IOThread is
> used:
> 
>   ../hw/scsi/scsi-disk.c:368: scsi_dma_complete: Assertion `r->req.aiocb != 
> NULL' failed.
> 
> Fix the race by holding the AioContext across dma_blk_cb(). Now
> dma_aio_cancel() under the AioContext lock will not see
> inconsistent/intermediate states.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Two things that seem to need attention in the review:

> diff --git a/softmmu/dma-helpers.c b/softmmu/dma-helpers.c
> index 7820fec54c..2463964805 100644
> --- a/softmmu/dma-helpers.c
> +++ b/softmmu/dma-helpers.c
> @@ -113,17 +113,19 @@ static void dma_complete(DMAAIOCB *dbs, int ret)
>  static void dma_blk_cb(void *opaque, int ret)
>  {
>      DMAAIOCB *dbs = (DMAAIOCB *)opaque;
> +    AioContext *ctx = dbs->ctx;
>      dma_addr_t cur_addr, cur_len;
>      void *mem;
>  
>      trace_dma_blk_cb(dbs, ret);
>  
> +    aio_context_acquire(ctx);

During the first iteration, the caller may already hold the AioContext
lock. In the case of scsi-disk, it does. Locking a second time is fine
in principle because it's a recursive lock, but we have to be careful
not to call AIO_WAIT_WHILE() indirectly then because it could deadlock.

Except for the dbs->common.cb (see below) I don't see any functions that
would be problematic in this respect. In fact, the one I would be most
worried about is dbs->io_func, but it was already locked before.

>      dbs->acb = NULL;
>      dbs->offset += dbs->iov.size;
>  
>      if (dbs->sg_cur_index == dbs->sg->nsg || ret < 0) {
>          dma_complete(dbs, ret);

All request callbacks hold the AioContext lock now when they didn't
before. I wonder if it's worth documenting the locking rules for
dma_blk_io() in a comment. Could be a separate patch, though.

You remove the locking in scsi_dma_complete() to compensate. All the
other callers come from IDE and nvme, which don't take the lock
internally. Taking the main AioContext lock once is fine, so this looks
good.

If it is possible that we already complete on the first iteration, this
could however also be affected by the case above so that the AioContext
is locked twice. In this case, the invoked callback must not call
AIO_WAIT_WHILE() and we would need to audit IDE and nvme.

Is it possible? In other words, can dbs->sg->nsg be 0? If not, can we
assert and document it?

> -        return;
> +        goto out;
>      }
>      dma_blk_unmap(dbs);
>  
> @@ -164,9 +166,9 @@ static void dma_blk_cb(void *opaque, int ret)
>  
>      if (dbs->iov.size == 0) {
>          trace_dma_map_wait(dbs);
> -        dbs->bh = aio_bh_new(dbs->ctx, reschedule_dma, dbs);
> +        dbs->bh = aio_bh_new(ctx, reschedule_dma, dbs);

Does copying dbs->ctx to a local variable imply that it may change
during the function? I didn't think so, but if it may, then why is
calling aio_bh_new() with the old value right?

>          cpu_register_map_client(dbs->bh);
> -        return;
> +        goto out;
>      }
>  
>      if (!QEMU_IS_ALIGNED(dbs->iov.size, dbs->align)) {
> @@ -174,11 +176,11 @@ static void dma_blk_cb(void *opaque, int ret)
>                                  QEMU_ALIGN_DOWN(dbs->iov.size, dbs->align));
>      }
>  
> -    aio_context_acquire(dbs->ctx);
>      dbs->acb = dbs->io_func(dbs->offset, &dbs->iov,
>                              dma_blk_cb, dbs, dbs->io_func_opaque);
> -    aio_context_release(dbs->ctx);
>      assert(dbs->acb);
> +out:
> +    aio_context_release(ctx);
>  }

Kevin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]