qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH 3/3] linux-aio: fix re-entrant completion proces


From: Roman Penyaev
Subject: Re: [Qemu-block] [PATCH 3/3] linux-aio: fix re-entrant completion processing
Date: Tue, 27 Sep 2016 16:29:55 +0200

Hey Stefan,

On Tue, Sep 27, 2016 at 4:06 PM, Stefan Hajnoczi <address@hidden> wrote:
> Commit 0ed93d84edabc7656f5c998ae1a346fe8b94ca54 ("linux-aio: process
> completions from ioq_submit()") added an optimization that processes
> completions each time ioq_submit() returns with requests in flight.
> This commit introduces a "Co-routine re-entered recursively" error which
> can be triggered with -drive format=qcow2,aio=native.
>
> Fam Zheng <address@hidden>, Kevin Wolf <address@hidden>, and I
> debugged the following backtrace:
>
>   (gdb) bt
>   #0  0x00007ffff0a046f5 in raise () at /lib64/libc.so.6
>   #1  0x00007ffff0a062fa in abort () at /lib64/libc.so.6
>   #2  0x0000555555ac0013 in qemu_coroutine_enter (co=0x5555583464d0) at 
> util/qemu-coroutine.c:113
>   #3  0x0000555555a4b663 in qemu_laio_process_completions (address@hidden) at 
> block/linux-aio.c:218
>   #4  0x0000555555a4b874 in ioq_submit (address@hidden) at 
> block/linux-aio.c:331
>   #5  0x0000555555a4ba12 in laio_do_submit (address@hidden, address@hidden, 
> address@hidden, address@hidden) at block/linux-aio.c:383
>   #6  0x0000555555a4bbd3 in laio_co_submit (bs=<optimized out>, 
> s=0x555557e2f7f0, fd=13, offset=2932727808, qiov=0x555559d38e20, type=1) at 
> block/linux-aio.c:402
>   #7  0x0000555555a4fd23 in bdrv_driver_preadv (address@hidden, 
> address@hidden, address@hidden, address@hidden, flags=0) at block/io.c:804
>   #8  0x0000555555a52b34 in bdrv_aligned_preadv (address@hidden, 
> address@hidden, address@hidden, address@hidden, address@hidden, 
> address@hidden, flags=0) at block/io.c:1041
>   #9  0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, 
> offset=2932727808, bytes=8192, address@hidden, address@hidden) at 
> block/io.c:1133
>   #10 0x0000555555a29629 in qcow2_co_preadv (bs=0x555556635890, 
> offset=6178725888, bytes=8192, qiov=0x555557527840, flags=<optimized out>) at 
> block/qcow2.c:1509
>   #11 0x0000555555a4fd23 in bdrv_driver_preadv (address@hidden, 
> address@hidden, address@hidden, address@hidden, flags=0) at block/io.c:804
>   #12 0x0000555555a52b34 in bdrv_aligned_preadv (address@hidden, 
> address@hidden, address@hidden, address@hidden, address@hidden, 
> address@hidden, flags=0) at block/io.c:1041
>   #13 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, 
> address@hidden, address@hidden, address@hidden, address@hidden) at 
> block/io.c:1133
>   #14 0x0000555555a4515a in blk_co_preadv (blk=0x5555566356d0, 
> offset=6178725888, bytes=8192, qiov=0x555557527840, flags=0) at 
> block/block-backend.c:783
>   #15 0x0000555555a45266 in blk_aio_read_entry (opaque=0x5555577025e0) at 
> block/block-backend.c:991
>   #16 0x0000555555ac0cfa in coroutine_trampoline (i0=<optimized out>, 
> i1=<optimized out>) at util/coroutine-ucontext.c:78
>
> It turned out that re-entrant ioq_submit() and completion processing
> between three requests caused this error.  The following check is not
> sufficient to prevent recursively entering coroutines:
>
>   if (laiocb->co != qemu_coroutine_self()) {
>       qemu_coroutine_enter(laiocb->co);
>   }
>
> As the following coroutine backtrace shows, not just the current
> coroutine (self) can be entered.  There might also be other coroutines
> that are currently entered and transferred control due to the qcow2 lock
> (CoMutex):

I doubt that that was introduced by the commit you've specified:
0ed93d84edab.

Before my patch coroutine was unconditionally entered.  The following
is what was changed by 0ed93d84edab:

     if (laiocb->co) {
-        qemu_coroutine_enter(laiocb->co);
+        /* Jump and continue completion for foreign requests, don't do
+         * anything for current request, it will be completed shortly. */
+        if (laiocb->co != qemu_coroutine_self()) {
+            qemu_coroutine_enter(laiocb->co);
+        }

If you have a strong reproduction, could you please verify that.

What worries me is the following:

 1. Issue was introduced before and was unnoticed (ok).
 2. Issue - is something else and/or was really introduced by commit
    0ed93d84edab (not ok).

Of course the 2. is not nice.
Thanks.

--
Roman



reply via email to

[Prev in Thread] Current Thread [Next in Thread]