[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-block] [PATCH 3/3] linux-aio: fix re-entrant completion proces
From: |
Roman Penyaev |
Subject: |
Re: [Qemu-block] [PATCH 3/3] linux-aio: fix re-entrant completion processing |
Date: |
Tue, 27 Sep 2016 16:29:55 +0200 |
Hey Stefan,
On Tue, Sep 27, 2016 at 4:06 PM, Stefan Hajnoczi <address@hidden> wrote:
> Commit 0ed93d84edabc7656f5c998ae1a346fe8b94ca54 ("linux-aio: process
> completions from ioq_submit()") added an optimization that processes
> completions each time ioq_submit() returns with requests in flight.
> This commit introduces a "Co-routine re-entered recursively" error which
> can be triggered with -drive format=qcow2,aio=native.
>
> Fam Zheng <address@hidden>, Kevin Wolf <address@hidden>, and I
> debugged the following backtrace:
>
> (gdb) bt
> #0 0x00007ffff0a046f5 in raise () at /lib64/libc.so.6
> #1 0x00007ffff0a062fa in abort () at /lib64/libc.so.6
> #2 0x0000555555ac0013 in qemu_coroutine_enter (co=0x5555583464d0) at
> util/qemu-coroutine.c:113
> #3 0x0000555555a4b663 in qemu_laio_process_completions (address@hidden) at
> block/linux-aio.c:218
> #4 0x0000555555a4b874 in ioq_submit (address@hidden) at
> block/linux-aio.c:331
> #5 0x0000555555a4ba12 in laio_do_submit (address@hidden, address@hidden,
> address@hidden, address@hidden) at block/linux-aio.c:383
> #6 0x0000555555a4bbd3 in laio_co_submit (bs=<optimized out>,
> s=0x555557e2f7f0, fd=13, offset=2932727808, qiov=0x555559d38e20, type=1) at
> block/linux-aio.c:402
> #7 0x0000555555a4fd23 in bdrv_driver_preadv (address@hidden,
> address@hidden, address@hidden, address@hidden, flags=0) at block/io.c:804
> #8 0x0000555555a52b34 in bdrv_aligned_preadv (address@hidden,
> address@hidden, address@hidden, address@hidden, address@hidden,
> address@hidden, flags=0) at block/io.c:1041
> #9 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>,
> offset=2932727808, bytes=8192, address@hidden, address@hidden) at
> block/io.c:1133
> #10 0x0000555555a29629 in qcow2_co_preadv (bs=0x555556635890,
> offset=6178725888, bytes=8192, qiov=0x555557527840, flags=<optimized out>) at
> block/qcow2.c:1509
> #11 0x0000555555a4fd23 in bdrv_driver_preadv (address@hidden,
> address@hidden, address@hidden, address@hidden, flags=0) at block/io.c:804
> #12 0x0000555555a52b34 in bdrv_aligned_preadv (address@hidden,
> address@hidden, address@hidden, address@hidden, address@hidden,
> address@hidden, flags=0) at block/io.c:1041
> #13 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>,
> address@hidden, address@hidden, address@hidden, address@hidden) at
> block/io.c:1133
> #14 0x0000555555a4515a in blk_co_preadv (blk=0x5555566356d0,
> offset=6178725888, bytes=8192, qiov=0x555557527840, flags=0) at
> block/block-backend.c:783
> #15 0x0000555555a45266 in blk_aio_read_entry (opaque=0x5555577025e0) at
> block/block-backend.c:991
> #16 0x0000555555ac0cfa in coroutine_trampoline (i0=<optimized out>,
> i1=<optimized out>) at util/coroutine-ucontext.c:78
>
> It turned out that re-entrant ioq_submit() and completion processing
> between three requests caused this error. The following check is not
> sufficient to prevent recursively entering coroutines:
>
> if (laiocb->co != qemu_coroutine_self()) {
> qemu_coroutine_enter(laiocb->co);
> }
>
> As the following coroutine backtrace shows, not just the current
> coroutine (self) can be entered. There might also be other coroutines
> that are currently entered and transferred control due to the qcow2 lock
> (CoMutex):
I doubt that that was introduced by the commit you've specified:
0ed93d84edab.
Before my patch coroutine was unconditionally entered. The following
is what was changed by 0ed93d84edab:
if (laiocb->co) {
- qemu_coroutine_enter(laiocb->co);
+ /* Jump and continue completion for foreign requests, don't do
+ * anything for current request, it will be completed shortly. */
+ if (laiocb->co != qemu_coroutine_self()) {
+ qemu_coroutine_enter(laiocb->co);
+ }
If you have a strong reproduction, could you please verify that.
What worries me is the following:
1. Issue was introduced before and was unnoticed (ok).
2. Issue - is something else and/or was really introduced by commit
0ed93d84edab (not ok).
Of course the 2. is not nice.
Thanks.
--
Roman
[Qemu-block] [PATCH 2/3] test-coroutine: test qemu_coroutine_entered(), Stefan Hajnoczi, 2016/09/27
[Qemu-block] [PATCH 3/3] linux-aio: fix re-entrant completion processing, Stefan Hajnoczi, 2016/09/27
- Re: [Qemu-block] [PATCH 3/3] linux-aio: fix re-entrant completion processing,
Roman Penyaev <=
- Re: [Qemu-block] [PATCH 3/3] linux-aio: fix re-entrant completion processing, Stefan Hajnoczi, 2016/09/27
- Re: [Qemu-block] [PATCH 3/3] linux-aio: fix re-entrant completion processing, Roman Penyaev, 2016/09/27
- Re: [Qemu-block] [Qemu-devel] [PATCH 3/3] linux-aio: fix re-entrant completion processing, Fam Zheng, 2016/09/27
- Re: [Qemu-block] [Qemu-devel] [PATCH 3/3] linux-aio: fix re-entrant completion processing, Roman Penyaev, 2016/09/28
- Re: [Qemu-block] [Qemu-devel] [PATCH 3/3] linux-aio: fix re-entrant completion processing, Fam Zheng, 2016/09/28
- Re: [Qemu-block] [Qemu-devel] [PATCH 3/3] linux-aio: fix re-entrant completion processing, Roman Penyaev, 2016/09/28