qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bdrv_drained_begin deadlock with io-threads


From: Kevin Wolf
Subject: Re: bdrv_drained_begin deadlock with io-threads
Date: Thu, 2 Apr 2020 19:10:07 +0200
User-agent: Mutt/1.12.1 (2019-06-15)

Am 02.04.2020 um 18:47 hat Kevin Wolf geschrieben:
> Am 02.04.2020 um 17:40 hat Dietmar Maurer geschrieben:
> > > Can you reproduce the problem with my script, but pointing it to your
> > > Debian image and running stress-ng instead of dd? 
> > 
> > yes
> > 
> > > If so, how long does
> > > it take to reproduce for you?
> > 
> > I sometimes need up to 130 iterations ...
> > 
> > Worse, I thought several times the bug is gone, but then it triggered again 
> > (sigh).
> > 
> > But most times below 30 iteration (if you run "stress-ng -d 5").
> > 
> > Also note that it can happen at different places, but always inside a 
> > drained section ...
> 
> I got a stracktrace of a hanging coroutine:
> 
> (gdb) qemu coroutine 0x7fd8c00132c0
> #0  0x00005630e16e9840 in qemu_coroutine_switch 
> (from_=from_@entry=0x7fd8c00132c0, to_=to_@entry=0x7fd8cda865b8, 
> action=action@entry=COROUTINE_YIELD) at util/coroutine-ucontext.c:218
> #1  0x00005630e16e8521 in qemu_coroutine_yield () at util/qemu-coroutine.c:193
> #2  0x00005630e16e8ba5 in qemu_co_queue_wait_impl 
> (queue=queue@entry=0x5630e48ab580, lock=lock@entry=0x0) at 
> util/qemu-coroutine-lock.c:56
> #3  0x00005630e161f1ae in blk_wait_while_drained 
> (blk=blk@entry=0x5630e48ab260) at 
> /home/kwolf/source/qemu/include/qemu/lockable.h:46
> #4  0x00005630e1620600 in blk_wait_while_drained (blk=0x5630e48ab260) at 
> block/block-backend.c:1189
> #5  0x00005630e1620600 in blk_co_pwritev_part (blk=0x5630e48ab260, 
> offset=2922381312, bytes=856064, qiov=qiov@entry=0x7fd8c002cd70, 
> qiov_offset=qiov_offset@entry=0, flags=0)
>     at block/block-backend.c:1189
> #6  0x00005630e16207ce in blk_co_pwritev (flags=<optimized out>, 
> qiov=0x7fd8c002cd70, bytes=<optimized out>, offset=<optimized out>, 
> blk=<optimized out>) at block/block-backend.c:1429
> #7  0x00005630e16207ce in blk_aio_write_entry (opaque=0x7fd8c002cdc0) at 
> block/block-backend.c:1429
> #8  0x00005630e16e98bb in coroutine_trampoline (i0=<optimized out>, 
> i1=<optimized out>) at util/coroutine-ucontext.c:115
> 
> So I think this is the bug: Calling blk_wait_while_drained() from
> anywhere between blk_inc_in_flight() and blk_dec_in_flight() is wrong
> because it will deadlock the drain operation.
> 
> blk_aio_read/write_entry() take care of this and drop their reference
> around blk_wait_while_drained(). But if we hit the race condition that
> drain hasn't yet started there, but it has when we get to
> blk_co_preadv() or blk_co_pwritev_part(), then we're in a buggy code
> path.

With the following patch, it seems to survive for now. I'll give it some
more testing tomorrow (also qemu-iotests to check that I didn't
accidentally break something else.)

Kevin


diff --git a/block/block-backend.c b/block/block-backend.c
index 8b8f2a80a0..634acd1541 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1143,7 +1143,9 @@ static int blk_check_byte_request(BlockBackend *blk, 
int64_t offset,
 static void coroutine_fn blk_wait_while_drained(BlockBackend *blk)
 {
     if (blk->quiesce_counter && !blk->disable_request_queuing) {
+        blk_dec_in_flight(blk);
         qemu_co_queue_wait(&blk->queued_requests, NULL);
+        blk_inc_in_flight(blk);
     }
 }
 
@@ -1154,8 +1156,6 @@ int coroutine_fn blk_co_preadv(BlockBackend *blk, int64_t 
offset,
     int ret;
     BlockDriverState *bs;
 
-    blk_wait_while_drained(blk);
-
     /* Call blk_bs() only after waiting, the graph may have changed */
     bs = blk_bs(blk);
     trace_blk_co_preadv(blk, bs, offset, bytes, flags);
@@ -1186,8 +1186,6 @@ int coroutine_fn blk_co_pwritev_part(BlockBackend *blk, 
int64_t offset,
     int ret;
     BlockDriverState *bs;
 
-    blk_wait_while_drained(blk);
-
     /* Call blk_bs() only after waiting, the graph may have changed */
     bs = blk_bs(blk);
     trace_blk_co_pwritev(blk, bs, offset, bytes, flags);
@@ -1262,6 +1260,7 @@ static int blk_prw(BlockBackend *blk, int64_t offset, 
uint8_t *buf,
         .ret    = NOT_DONE,
     };
 
+    blk_inc_in_flight(blk);
     if (qemu_in_coroutine()) {
         /* Fast-path if already in coroutine context */
         co_entry(&rwco);
@@ -1270,6 +1269,7 @@ static int blk_prw(BlockBackend *blk, int64_t offset, 
uint8_t *buf,
         bdrv_coroutine_enter(blk_bs(blk), co);
         BDRV_POLL_WHILE(blk_bs(blk), rwco.ret == NOT_DONE);
     }
+    blk_dec_in_flight(blk);
 
     return rwco.ret;
 }
@@ -1388,9 +1388,7 @@ static void blk_aio_read_entry(void *opaque)
     QEMUIOVector *qiov = rwco->iobuf;
 
     if (rwco->blk->quiesce_counter) {
-        blk_dec_in_flight(rwco->blk);
         blk_wait_while_drained(rwco->blk);
-        blk_inc_in_flight(rwco->blk);
     }
 
     assert(qiov->size == acb->bytes);
@@ -1406,9 +1404,7 @@ static void blk_aio_write_entry(void *opaque)
     QEMUIOVector *qiov = rwco->iobuf;
 
     if (rwco->blk->quiesce_counter) {
-        blk_dec_in_flight(rwco->blk);
         blk_wait_while_drained(rwco->blk);
-        blk_inc_in_flight(rwco->blk);
     }
 
     assert(!qiov || qiov->size == acb->bytes);




reply via email to

[Prev in Thread] Current Thread [Next in Thread]