[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [Qemu-block] [PATCH 1/7] block/nbd: Fix hang in .bdrv_c
From: |
Maxim Levitsky |
Subject: |
Re: [Qemu-devel] [Qemu-block] [PATCH 1/7] block/nbd: Fix hang in .bdrv_close() |
Date: |
Tue, 16 Jul 2019 16:08:33 +0300 |
On Fri, 2019-07-12 at 19:35 +0200, Max Reitz wrote:
> When nbd_close() is called from a coroutine, the connection_co never
> gets to run, and thus nbd_teardown_connection() hangs.
>
> This is because aio_co_enter() only puts the connection_co into the main
> coroutine's wake-up queue, so this main coroutine needs to yield and
> wait for connection_co to terminate.
After diving into NBD's co-routines (this is 2nd time I do this) and speaking
about this
with Max Reitz on IRC, could I suggest to extend the explanation a bit,
something like that:
When nbd_close() is called from a coroutine, the connection_co never
gets to run, and thus nbd_teardown_connection() hangs.
This happens because the connection_co is woken up by nbd_teardown_connection()
by closing the socket,
which wakes up the IO channel on which the connection_co is blocked.
However connection_co is waken up by aio_co_wake, which has an assumption that
if the caller is already in
a coroutine, the caller doesn't switch immediately to the woken coroutine, but
rather it adds the coroutine to list
of coroutines to wake immediately after the current co-routine
yields/terminates (self->co_queue_wakeup)
Since we instead do aio_poll, that never happens.
The patch itself looks fine, so
Reviewed-by: Maxim Levitsky <address@hidden>
Best regards,
Maxim Levitsky
>
> Suggested-by: Kevin Wolf <address@hidden>
> Signed-off-by: Max Reitz <address@hidden>
> ---
> block/nbd.c | 14 +++++++++++++-
> 1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/block/nbd.c b/block/nbd.c
> index 81edabbf35..8f5ee86842 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -61,6 +61,7 @@ typedef struct BDRVNBDState {
> CoMutex send_mutex;
> CoQueue free_sema;
> Coroutine *connection_co;
> + Coroutine *teardown_co;
> int in_flight;
>
> NBDClientRequest requests[MAX_NBD_REQUESTS];
> @@ -135,7 +136,15 @@ static void nbd_teardown_connection(BlockDriverState *bs)
> qio_channel_shutdown(s->ioc,
> QIO_CHANNEL_SHUTDOWN_BOTH,
> NULL);
> - BDRV_POLL_WHILE(bs, s->connection_co);
> + if (qemu_in_coroutine()) {
> + s->teardown_co = qemu_coroutine_self();
> + /* connection_co resumes us when it terminates */
> + qemu_coroutine_yield();
> + s->teardown_co = NULL;
> + } else {
> + BDRV_POLL_WHILE(bs, s->connection_co);
> + }
> + assert(!s->connection_co);
>
> nbd_client_detach_aio_context(bs);
> object_unref(OBJECT(s->sioc));
> @@ -207,6 +216,9 @@ static coroutine_fn void nbd_connection_entry(void
> *opaque)
> bdrv_dec_in_flight(s->bs);
>
> s->connection_co = NULL;
> + if (s->teardown_co) {
> + aio_co_wake(s->teardown_co);
> + }
> aio_wait_kick();
> }
>