qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] aio context ownership during bdrv_close()


From: Anton Kuchin
Subject: Re: [Qemu-devel] aio context ownership during bdrv_close()
Date: Mon, 6 May 2019 15:47:33 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1

On 29/04/2019 13:38, Kevin Wolf wrote:
Am 26.04.2019 um 14:24 hat Anton Kuchin geschrieben:
I can't figure out ownership of aio context during bdrv_close().

As far as I understand bdrv_unref() shold be called with acquired aio
context to prevent concurrent operations (at least most usages in blockdev.c
explicitly acquire and release context, but not all).
I think the theory is like this:

1. bdrv_unref() can only be called from the main thread

2. A block node for which bdrv_close() is called has no references. If
    there are no more parents that keep it in a non-default iothread,
    they should be in the main AioContext. So no locks need to be taken
    during bdrv_close().

In practice, 2. is not fully true today, even though block devices do
stop their dataplane and move the block nodes back to the main
AioContext on shutdown. I am currently working on some fixes related to
this, afterwards the situation should be better.

But if refcount reaches zero and bs is going to be deleted in bdrv_close()
we need to ensure that drain is finished data is flushed and there are no
more pending coroutines and bottomhalves, so drain and flush functions can
enter coroutine and perform yield in several places. As a result control
returns to coroutine caller that will release aio context and when
completion bh will continue cleanup process it will be executed without
ownership of context. Is this a valid situation?
Do you have an example where this happens?

Normally, leaving the coroutine means that the AioContext lock is
released, but it is later reentered from the same AioContext, so the
lock will be taken again.
I was wrong. Coroutines do acquire aio context when reentered.

Moreover if yield happens bs that is being deleted has zero refcount but is
still present in lists graph_bdrv_states and all_bdrv_states and can be
accidentally accessed. Shouldn't we remove it from these lists ASAP when
deletion process starts as we do from monitor_bdrv_states?
Hm, I think it should only disappear when the image file is actually
closed. But in practice, it probably makes little difference either way.

I'm still worried about a period of time since coroutine yields and until it is reentered, looks like aio context can be grabbed by other code and bs can be treated as valid.

I have no example of such situation too but I see there bdrv_aio_flush and bdrv_co_flush_to_disk callbacks which are called during flush and can take unpredicable time to complete (e.g. raw_aio_flush in file-win32 uses thread pool inside to process request and RBD can also be affected but I didn't dig deep enough to be sure).

If main loop starts processing next qmp command before completion is called lists will be in inconsistent state and hard to debug use-after-free bugs and crashes can happen.

Fix seems trivial and shouldn't break any existing code.

---

diff --git a/block.c b/block.c
index 16615bc876..25c3b72520 100644
--- a/block.c
+++ b/block.c
@@ -4083,14 +4083,14 @@ static void bdrv_delete(BlockDriverState *bs)
     assert(bdrv_op_blocker_is_empty(bs));
     assert(!bs->refcnt);

-    bdrv_close(bs);
-
     /* remove from list, if necessary */
     if (bs->node_name[0] != '\0') {
         QTAILQ_REMOVE(&graph_bdrv_states, bs, node_list);
     }
     QTAILQ_REMOVE(&all_bdrv_states, bs, bs_list);

+    bdrv_close(bs);
+
     g_free(bs);
 }

--



Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]