[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] aio context ownership during bdrv_close()
From: |
Kevin Wolf |
Subject: |
Re: [Qemu-devel] aio context ownership during bdrv_close() |
Date: |
Mon, 6 May 2019 15:57:03 +0200 |
User-agent: |
Mutt/1.11.3 (2019-02-01) |
Am 06.05.2019 um 14:47 hat Anton Kuchin geschrieben:
> On 29/04/2019 13:38, Kevin Wolf wrote:
> > Am 26.04.2019 um 14:24 hat Anton Kuchin geschrieben:
> > > I can't figure out ownership of aio context during bdrv_close().
> > >
> > > As far as I understand bdrv_unref() shold be called with acquired aio
> > > context to prevent concurrent operations (at least most usages in
> > > blockdev.c
> > > explicitly acquire and release context, but not all).
> > I think the theory is like this:
> >
> > 1. bdrv_unref() can only be called from the main thread
> >
> > 2. A block node for which bdrv_close() is called has no references. If
> > there are no more parents that keep it in a non-default iothread,
> > they should be in the main AioContext. So no locks need to be taken
> > during bdrv_close().
> >
> > In practice, 2. is not fully true today, even though block devices do
> > stop their dataplane and move the block nodes back to the main
> > AioContext on shutdown. I am currently working on some fixes related to
> > this, afterwards the situation should be better.
> >
> > > But if refcount reaches zero and bs is going to be deleted in bdrv_close()
> > > we need to ensure that drain is finished data is flushed and there are no
> > > more pending coroutines and bottomhalves, so drain and flush functions can
> > > enter coroutine and perform yield in several places. As a result control
> > > returns to coroutine caller that will release aio context and when
> > > completion bh will continue cleanup process it will be executed without
> > > ownership of context. Is this a valid situation?
> > Do you have an example where this happens?
> >
> > Normally, leaving the coroutine means that the AioContext lock is
> > released, but it is later reentered from the same AioContext, so the
> > lock will be taken again.
> I was wrong. Coroutines do acquire aio context when reentered.
> >
> > > Moreover if yield happens bs that is being deleted has zero refcount but
> > > is
> > > still present in lists graph_bdrv_states and all_bdrv_states and can be
> > > accidentally accessed. Shouldn't we remove it from these lists ASAP when
> > > deletion process starts as we do from monitor_bdrv_states?
> > Hm, I think it should only disappear when the image file is actually
> > closed. But in practice, it probably makes little difference either way.
>
> I'm still worried about a period of time since coroutine yields and until it
> is reentered, looks like aio context can be grabbed by other code and bs can
> be treated as valid.
>
> I have no example of such situation too but I see there bdrv_aio_flush and
> bdrv_co_flush_to_disk callbacks which are called during flush and can take
> unpredicable time to complete (e.g. raw_aio_flush in file-win32 uses thread
> pool inside to process request and RBD can also be affected but I didn't dig
> deep enough to be sure).
>
> If main loop starts processing next qmp command before completion is called
> lists will be in inconsistent state and hard to debug use-after-free bugs
> and crashes can happen.
Hm, at the point of flush, bs is actually still valid, so e.g.
query-block would just work. But I think we would indeed have a problem
if a new reference to the node is created.
> Fix seems trivial and shouldn't break any existing code.
>
> ---
>
> diff --git a/block.c b/block.c
> index 16615bc876..25c3b72520 100644
> --- a/block.c
> +++ b/block.c
> @@ -4083,14 +4083,14 @@ static void bdrv_delete(BlockDriverState *bs)
> assert(bdrv_op_blocker_is_empty(bs));
> assert(!bs->refcnt);
>
> - bdrv_close(bs);
> -
> /* remove from list, if necessary */
> if (bs->node_name[0] != '\0') {
> QTAILQ_REMOVE(&graph_bdrv_states, bs, node_list);
> }
> QTAILQ_REMOVE(&all_bdrv_states, bs, bs_list);
>
> + bdrv_close(bs);
> +
> g_free(bs);
> }
This looks reasonable enough to me.
Kevin