Am 26.04.2019 um 14:24 hat Anton Kuchin geschrieben:
I can't figure out ownership of aio context during bdrv_close().
As far as I understand bdrv_unref() shold be called with acquired aio
context to prevent concurrent operations (at least most usages in blockdev.c
explicitly acquire and release context, but not all).
I think the theory is like this:
1. bdrv_unref() can only be called from the main thread
2. A block node for which bdrv_close() is called has no references. If
there are no more parents that keep it in a non-default iothread,
they should be in the main AioContext. So no locks need to be taken
during bdrv_close().
In practice, 2. is not fully true today, even though block devices do
stop their dataplane and move the block nodes back to the main
AioContext on shutdown. I am currently working on some fixes related to
this, afterwards the situation should be better.
But if refcount reaches zero and bs is going to be deleted in bdrv_close()
we need to ensure that drain is finished data is flushed and there are no
more pending coroutines and bottomhalves, so drain and flush functions can
enter coroutine and perform yield in several places. As a result control
returns to coroutine caller that will release aio context and when
completion bh will continue cleanup process it will be executed without
ownership of context. Is this a valid situation?
Do you have an example where this happens?
Normally, leaving the coroutine means that the AioContext lock is
released, but it is later reentered from the same AioContext, so the
lock will be taken again.
Moreover if yield happens bs that is being deleted has zero refcount but is
still present in lists graph_bdrv_states and all_bdrv_states and can be
accidentally accessed. Shouldn't we remove it from these lists ASAP when
deletion process starts as we do from monitor_bdrv_states?
Hm, I think it should only disappear when the image file is actually
closed. But in practice, it probably makes little difference either way.