[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 2/8] block: protect write threshold QMP commands from conc
From: |
Paolo Bonzini |
Subject: |
Re: [PATCH v2 2/8] block: protect write threshold QMP commands from concurrent requests |
Date: |
Wed, 5 May 2021 13:29:26 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 |
On 05/05/21 10:55, Stefan Hajnoczi wrote:
@@ -119,10 +118,8 @@ void qmp_block_set_write_threshold(const char *node_name,
return;
}
- aio_context = bdrv_get_aio_context(bs);
- aio_context_acquire(aio_context);
-
+ /* Avoid a concurrent write_threshold_disable. */
+ bdrv_drained_begin(bs);
Is there documentation that says it's safe to call
bdrv_drained_begin(bs) without the AioContext acquired? AIO_WAIT_WHILE()
contradicts this:
The caller's thread must be the IOThread that owns @ctx or the main loop
thread (with @ctx acquired exactly once).
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Well, I cannot remember why this patch was correct at the time I was
working on them. Patches 7/8 on the other hand were okay because back
then bdrv_reopen_multiple() called bdrv_drain_all_begin().
Overall, what survives of this series is patches 5 and 6, plus Vladimir's
take on the write threshold code.
Anyway, things have gotten _more_ broken in the meanwhile, and this is
probably what causes the deadlocks that Emanuele has seen with the
remainder of the branch. Since this patch:
commit aa1361d54aac43094b98024b8b6c804eb6e41661
Author: Kevin Wolf <kwolf@redhat.com>
Date: Fri Aug 17 18:54:18 2018 +0200
block: Add missing locking in bdrv_co_drain_bh_cb()
bdrv_do_drained_begin/end() assume that they are called with the
AioContext lock of bs held. If we call drain functions from a coroutine
with the AioContext lock held, we yield and schedule a BH to move out of
coroutine context. This means that the lock for the home context of the
coroutine is released and must be re-acquired in the bottom half.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
AIO_WAIT_WHILE is going down a path that does not do the release/acquire of
the AioContext, which can and will cause deadlocks when the main thread
tries to acquire the AioContext and the I/O thread is in bdrv_co_drain.
The message that introduced it does not help very much
(https://mail.gnu.org/archive/html/qemu-devel/2018-09/msg01687.html)
but I think this must be reverted.
The next steps however should have less problems with bitrot, in particular
the snapshots have changed very little. Block job code is very different
but it is all in the main thread.
Paolo