[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 2/3] dma-helpers: prevent dma_blk_cb() vs dma_aio_cancel()
From: |
Stefan Hajnoczi |
Subject: |
Re: [PATCH v2 2/3] dma-helpers: prevent dma_blk_cb() vs dma_aio_cancel() race |
Date: |
Thu, 16 Feb 2023 16:27:54 -0500 |
On Thu, Feb 16, 2023 at 04:27:42PM +0100, Kevin Wolf wrote:
> Am 10.02.2023 um 15:32 hat Stefan Hajnoczi geschrieben:
> > dma_blk_cb() only takes the AioContext lock around ->io_func(). That
> > means the rest of dma_blk_cb() is not protected. In particular, the
> > DMAAIOCB field accesses happen outside the lock.
> >
> > There is a race when the main loop thread holds the AioContext lock and
> > invokes scsi_device_purge_requests() -> bdrv_aio_cancel() ->
> > dma_aio_cancel() while an IOThread executes dma_blk_cb(). The dbs->acb
> > field determines how cancellation proceeds. If dma_aio_cancel() see
> > dbs->acb == NULL while dma_blk_cb() is still running, the request can be
> > completed twice (-ECANCELED and the actual return value).
> >
> > The following assertion can occur with virtio-scsi when an IOThread is
> > used:
> >
> > ../hw/scsi/scsi-disk.c:368: scsi_dma_complete: Assertion `r->req.aiocb !=
> > NULL' failed.
> >
> > Fix the race by holding the AioContext across dma_blk_cb(). Now
> > dma_aio_cancel() under the AioContext lock will not see
> > inconsistent/intermediate states.
> >
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>
> Two things that seem to need attention in the review:
>
> > diff --git a/softmmu/dma-helpers.c b/softmmu/dma-helpers.c
> > index 7820fec54c..2463964805 100644
> > --- a/softmmu/dma-helpers.c
> > +++ b/softmmu/dma-helpers.c
> > @@ -113,17 +113,19 @@ static void dma_complete(DMAAIOCB *dbs, int ret)
> > static void dma_blk_cb(void *opaque, int ret)
> > {
> > DMAAIOCB *dbs = (DMAAIOCB *)opaque;
> > + AioContext *ctx = dbs->ctx;
> > dma_addr_t cur_addr, cur_len;
> > void *mem;
> >
> > trace_dma_blk_cb(dbs, ret);
> >
> > + aio_context_acquire(ctx);
>
> During the first iteration, the caller may already hold the AioContext
> lock. In the case of scsi-disk, it does. Locking a second time is fine
> in principle because it's a recursive lock, but we have to be careful
> not to call AIO_WAIT_WHILE() indirectly then because it could deadlock.
>
> Except for the dbs->common.cb (see below) I don't see any functions that
> would be problematic in this respect. In fact, the one I would be most
> worried about is dbs->io_func, but it was already locked before.
>
> > dbs->acb = NULL;
> > dbs->offset += dbs->iov.size;
> >
> > if (dbs->sg_cur_index == dbs->sg->nsg || ret < 0) {
> > dma_complete(dbs, ret);
>
> All request callbacks hold the AioContext lock now when they didn't
> before. I wonder if it's worth documenting the locking rules for
> dma_blk_io() in a comment. Could be a separate patch, though.
>
> You remove the locking in scsi_dma_complete() to compensate. All the
> other callers come from IDE and nvme, which don't take the lock
> internally. Taking the main AioContext lock once is fine, so this looks
> good.
>
> If it is possible that we already complete on the first iteration, this
> could however also be affected by the case above so that the AioContext
> is locked twice. In this case, the invoked callback must not call
> AIO_WAIT_WHILE() and we would need to audit IDE and nvme.
>
> Is it possible? In other words, can dbs->sg->nsg be 0? If not, can we
> assert and document it?
In the nsg == 0 case there's another problem: the completion callback
function is invoked and the AIOCB is unref'd before dma_blk_io() returns
the stale AIOCB pointer.
That would lead to problems later because the pattern is typically:
r->aiocb = dma_blk_io(...);
...
use r and r->aiocb later
So I don't think nsg = 0 makes sense.
Functions I looked at invoke dma_blk_io() only when there are still
bytes to transfer. I think we're safe but I'll admit I'm not 100% sure.
> > - return;
> > + goto out;
> > }
> > dma_blk_unmap(dbs);
> >
> > @@ -164,9 +166,9 @@ static void dma_blk_cb(void *opaque, int ret)
> >
> > if (dbs->iov.size == 0) {
> > trace_dma_map_wait(dbs);
> > - dbs->bh = aio_bh_new(dbs->ctx, reschedule_dma, dbs);
> > + dbs->bh = aio_bh_new(ctx, reschedule_dma, dbs);
>
> Does copying dbs->ctx to a local variable imply that it may change
> during the function? I didn't think so, but if it may, then why is
> calling aio_bh_new() with the old value right?
I changed this line for consistency, not to change behavior or fix a bug.
Regarding AioContext changes, they can't happen because no function that
changes the AioContext is called between this line and the earlier
aio_context_acquire().
(Having to worry about AioContext changes is a pain though and I look
forward to when we can get rid of this lock.)
Stefan
signature.asc
Description: PGP signature