[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v2 3/3] linux-aio: fix re-entrant completion pro
From: |
Fam Zheng |
Subject: |
Re: [Qemu-devel] [PATCH v2 3/3] linux-aio: fix re-entrant completion processing |
Date: |
Wed, 28 Sep 2016 10:40:07 +0800 |
User-agent: |
Mutt/1.7.0 (2016-08-17) |
On Tue, 09/27 16:18, Stefan Hajnoczi wrote:
> Commit 0ed93d84edabc7656f5c998ae1a346fe8b94ca54 ("linux-aio: process
> completions from ioq_submit()") added an optimization that processes
> completions each time ioq_submit() returns with requests in flight.
> This commit introduces a "Co-routine re-entered recursively" error which
> can be triggered with -drive format=qcow2,aio=native.
>
> Fam Zheng <address@hidden>, Kevin Wolf <address@hidden>, and I
> debugged the following backtrace:
>
> (gdb) bt
> #0 0x00007ffff0a046f5 in raise () at /lib64/libc.so.6
> #1 0x00007ffff0a062fa in abort () at /lib64/libc.so.6
> #2 0x0000555555ac0013 in qemu_coroutine_enter (co=0x5555583464d0) at
> util/qemu-coroutine.c:113
> #3 0x0000555555a4b663 in qemu_laio_process_completions (address@hidden) at
> block/linux-aio.c:218
> #4 0x0000555555a4b874 in ioq_submit (address@hidden) at
> block/linux-aio.c:331
> #5 0x0000555555a4ba12 in laio_do_submit (address@hidden, address@hidden,
> address@hidden, address@hidden) at block/linux-aio.c:383
> #6 0x0000555555a4bbd3 in laio_co_submit (bs=<optimized out>,
> s=0x555557e2f7f0, fd=13, offset=2932727808, qiov=0x555559d38e20, type=1) at
> block/linux-aio.c:402
> #7 0x0000555555a4fd23 in bdrv_driver_preadv (address@hidden,
> address@hidden, address@hidden, address@hidden, flags=0) at block/io.c:804
> #8 0x0000555555a52b34 in bdrv_aligned_preadv (address@hidden,
> address@hidden, address@hidden, address@hidden, address@hidden,
> address@hidden, flags=0) at block/io.c:1041
> #9 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>,
> offset=2932727808, bytes=8192, address@hidden, address@hidden) at
> block/io.c:1133
> #10 0x0000555555a29629 in qcow2_co_preadv (bs=0x555556635890,
> offset=6178725888, bytes=8192, qiov=0x555557527840, flags=<optimized out>) at
> block/qcow2.c:1509
> #11 0x0000555555a4fd23 in bdrv_driver_preadv (address@hidden,
> address@hidden, address@hidden, address@hidden, flags=0) at block/io.c:804
> #12 0x0000555555a52b34 in bdrv_aligned_preadv (address@hidden,
> address@hidden, address@hidden, address@hidden, address@hidden,
> address@hidden, flags=0) at block/io.c:1041
> #13 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>,
> address@hidden, address@hidden, address@hidden, address@hidden) at
> block/io.c:1133
> #14 0x0000555555a4515a in blk_co_preadv (blk=0x5555566356d0,
> offset=6178725888, bytes=8192, qiov=0x555557527840, flags=0) at
> block/block-backend.c:783
> #15 0x0000555555a45266 in blk_aio_read_entry (opaque=0x5555577025e0) at
> block/block-backend.c:991
> #16 0x0000555555ac0cfa in coroutine_trampoline (i0=<optimized out>,
> i1=<optimized out>) at util/coroutine-ucontext.c:78
>
> It turned out that re-entrant ioq_submit() and completion processing
> between three requests caused this error. The following check is not
> sufficient to prevent recursively entering coroutines:
>
> if (laiocb->co != qemu_coroutine_self()) {
> qemu_coroutine_enter(laiocb->co);
> }
>
> As the following coroutine backtrace shows, not just the current
> coroutine (self) can be entered. There might also be other coroutines
> that are currently entered and transferred control due to the qcow2 lock
> (CoMutex):
>
> (gdb) qemu coroutine 0x5555583464d0
> #0 0x0000555555ac0c90 in qemu_coroutine_switch (address@hidden,
> address@hidden, address@hidden) at util/coroutine-ucontext.c:175
> #1 0x0000555555abfe54 in qemu_coroutine_enter (co=0x5555572f9890) at
> util/qemu-coroutine.c:117
> #2 0x0000555555ac031c in qemu_co_queue_run_restart (address@hidden) at
> util/qemu-coroutine-lock.c:60
> #3 0x0000555555abfe5e in qemu_coroutine_enter (co=0x5555583462c0) at
> util/qemu-coroutine.c:119
> #4 0x0000555555a4b663 in qemu_laio_process_completions (address@hidden) at
> block/linux-aio.c:218
> #5 0x0000555555a4b874 in ioq_submit (address@hidden) at
> block/linux-aio.c:331
> #6 0x0000555555a4ba12 in laio_do_submit (address@hidden, address@hidden,
> address@hidden, address@hidden) at block/linux-aio.c:383
> #7 0x0000555555a4bbd3 in laio_co_submit (bs=<optimized out>,
> s=0x555557e2f7f0, fd=13, offset=2911477760, qiov=0x55555a338e80, type=1) at
> block/linux-aio.c:402
> #8 0x0000555555a4fd23 in bdrv_driver_preadv (address@hidden,
> address@hidden, address@hidden, address@hidden, flags=0) at block/io.c:804
> #9 0x0000555555a52b34 in bdrv_aligned_preadv (address@hidden,
> address@hidden, address@hidden, address@hidden, address@hidden,
> address@hidden, flags=0) at block/io.c:1041
> #10 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>,
> offset=2911477760, bytes=8192, address@hidden, address@hidden) at
> block/io.c:1133
> #11 0x0000555555a29629 in qcow2_co_preadv (bs=0x555556635890,
> offset=6157475840, bytes=8192, qiov=0x5555575df720, flags=<optimized out>) at
> block/qcow2.c:1509
> #12 0x0000555555a4fd23 in bdrv_driver_preadv (address@hidden,
> address@hidden, address@hidden, address@hidden, flags=0) at block/io.c:804
> #13 0x0000555555a52b34 in bdrv_aligned_preadv (address@hidden,
> address@hidden, address@hidden, address@hidden, address@hidden,
> address@hidden, flags=0) at block/io.c:1041
> #14 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>,
> address@hidden, address@hidden, address@hidden, address@hidden) at
> block/io.c:1133
> #15 0x0000555555a4515a in blk_co_preadv (blk=0x5555566356d0,
> offset=6157475840, bytes=8192, qiov=0x5555575df720, flags=0) at
> block/block-backend.c:783
> #16 0x0000555555a45266 in blk_aio_read_entry (opaque=0x555557231aa0) at
> block/block-backend.c:991
> #17 0x0000555555ac0cfa in coroutine_trampoline (i0=<optimized out>,
> i1=<optimized out>) at util/coroutine-ucontext.c:78
>
> Use the new qemu_coroutine_entered() function instead of comparing
> against qemu_coroutine_self(). This is correct because:
>
> 1. If a coroutine is not entered then it must have yielded to wait for
> I/O completion. It is therefore safe to enter.
>
> 2. If a coroutine is entered then it must be in
> ioq_submit()/qemu_laio_process_completions() because otherwise it
> would be yielded while waiting for I/O completion. Therefore it will
> check laio->ret and return from ioq_submit() instead of yielding,
> i.e. it's guaranteed not to hang.
>
> Reported-by: Fam Zheng <address@hidden>
> Tested-by: Fam Zheng <address@hidden>
> Signed-off-by: Stefan Hajnoczi <address@hidden>
> ---
> block/linux-aio.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/block/linux-aio.c b/block/linux-aio.c
> index d4e19d4..1685ec2 100644
> --- a/block/linux-aio.c
> +++ b/block/linux-aio.c
> @@ -94,9 +94,12 @@ static void qemu_laio_process_completion(struct
> qemu_laiocb *laiocb)
>
> laiocb->ret = ret;
> if (laiocb->co) {
> - /* Jump and continue completion for foreign requests, don't do
> - * anything for current request, it will be completed shortly. */
> - if (laiocb->co != qemu_coroutine_self()) {
> + /* If the coroutine is already entered it must be in ioq_submit() and
> + * will notice laio->ret has been filled in when it eventually runs
> + * later. Coroutines cannot be entered recursively so avoid doing
> + * that!
> + */
> + if (!qemu_coroutine_entered(laiocb->co)) {
> qemu_coroutine_enter(laiocb->co);
> }
> } else {
> --
> 2.7.4
>
Reviewed-by: Fam Zheng <address@hidden>