[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: bdrv_drained_begin deadlock with io-threads
From: |
Kevin Wolf |
Subject: |
Re: bdrv_drained_begin deadlock with io-threads |
Date: |
Wed, 1 Apr 2020 12:37:48 +0200 |
User-agent: |
Mutt/1.12.1 (2019-06-15) |
Am 31.03.2020 um 18:18 hat Dietmar Maurer geschrieben:
> > > Looks bdrv_parent_drained_poll_single() calls
> > > blk_root_drained_poll(), which return true in my case (in_flight > 5).
> >
> > Can you identify which BlockBackend is this? Specifically if it's the
> > one attached to a guest device or whether it belongs to the block job.
>
> This can trigger from various different places, but the simplest case is when
> its called from drive_backup_prepare
>
> > bdrv_drained_begin(bs);
>
> which is the backup source drive.
I mean the BlockBackend for which blk_root_drained_poll() is called.
> > Maybe have a look at the job coroutine, too. You can probably easiest
> > find it in the 'jobs' list, and then print the coroutine backtrace for
> > job->co.
>
> There is in drive_backup_prepare(), before the job gets created.
Oh, I see. Then it can't be job BlockBackend, of course.
> > > Looks like I am loosing poll events somewhere?
> >
> > I don't think we've lost any event if in_flight > 0. It means that
> > something is still supposedly active. Maybe the job deadlocked.
>
> This is a simple call to bdrv_drained_begin(bs) (before we setup the job).
>
> I really nobody else able to reproduce this (somebody already tried to
> reproduce)?
I can get hangs, but that's for job_completed(), not for starting the
job. Also, my hangs have a non-empty bs->tracked_requests, so it looks
like a different case to me.
In my case, the hanging requests looks like this:
(gdb) qemu coroutine 0x556e055750e0
#0 0x0000556e03999150 in qemu_coroutine_switch
(from_=from_@entry=0x556e055750e0, to_=to_@entry=0x7fd34bbeb5b8,
action=action@entry=COROUTINE_YIELD) at util/coroutine-ucontext.c:218
#1 0x0000556e03997e31 in qemu_coroutine_yield () at util/qemu-coroutine.c:193
#2 0x0000556e0397fc88 in thread_pool_submit_co (pool=0x7fd33c003120,
func=func@entry=0x556e038d59a0 <handle_aiocb_rw>, arg=arg@entry=0x7fd2d2b96440)
at util/thread-pool.c:289
#3 0x0000556e038d511d in raw_thread_pool_submit (bs=bs@entry=0x556e04e459b0,
func=func@entry=0x556e038d59a0 <handle_aiocb_rw>, arg=arg@entry=0x7fd2d2b96440)
at block/file-posix.c:1894
#4 0x0000556e038d58c3 in raw_co_prw (bs=0x556e04e459b0, offset=230957056,
bytes=4096, qiov=0x7fd33c006fe0, type=1) at block/file-posix.c:1941
Checking the thread pool request:
(gdb) p *((ThreadPool*)0x7fd33c003120).head .lh_first
$9 = {common = {aiocb_info = 0x556e03f43f80 <thread_pool_aiocb_info>, bs = 0x0,
cb = 0x556e0397f670 <thread_pool_co_cb>, opaque = 0x7fd2d2b96400, refcnt = 1},
pool = 0x7fd33c003120,
func = 0x556e038d59a0 <handle_aiocb_rw>, arg = 0x7fd2d2b96440, state =
THREAD_DONE, ret = 0, reqs = {tqe_next = 0x0, tqe_circ = {tql_next = 0x0,
tql_prev = 0x0}}, all = {le_next = 0x0,
le_prev = 0x7fd33c0031d0}}
So apparently the request is THREAD_DONE, but the coroutine was never
reentered. I saw one case where ctx.bh_list was empty, but I also have a
case where a BH sits there scheduled and apparently just doesn't get
run:
(gdb) p *((ThreadPool*)0x7fd33c003120).ctx.bh_list .slh_first
$13 = {ctx = 0x556e04e41a10, cb = 0x556e0397f8e0 <thread_pool_completion_bh>,
opaque = 0x7fd33c003120, next = {sle_next = 0x0}, flags = 3}
Stefan, I wonder if this is related to the recent changes to the BH
implementation.
Kevin
- Re: bdrv_drained_begin deadlock with io-threads,
Kevin Wolf <=
- Re: bdrv_drained_begin deadlock with io-threads, Dietmar Maurer, 2020/04/01
- Re: bdrv_drained_begin deadlock with io-threads, Dietmar Maurer, 2020/04/01
- Re: bdrv_drained_begin deadlock with io-threads, Kevin Wolf, 2020/04/01
- Re: bdrv_drained_begin deadlock with io-threads, Dietmar Maurer, 2020/04/01
- Re: bdrv_drained_begin deadlock with io-threads, Kevin Wolf, 2020/04/01
- Re: bdrv_drained_begin deadlock with io-threads, Dietmar Maurer, 2020/04/02
- Re: bdrv_drained_begin deadlock with io-threads, Dietmar Maurer, 2020/04/02
- Re: bdrv_drained_begin deadlock with io-threads, Kevin Wolf, 2020/04/02
- Re: bdrv_drained_begin deadlock with io-threads, Kevin Wolf, 2020/04/02
- Re: bdrv_drained_begin deadlock with io-threads, Dietmar Maurer, 2020/04/02