qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 2/8] block: protect write threshold QMP commands from conc


From: Paolo Bonzini
Subject: Re: [PATCH v2 2/8] block: protect write threshold QMP commands from concurrent requests
Date: Wed, 5 May 2021 13:29:26 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1

On 05/05/21 10:55, Stefan Hajnoczi wrote:
@@ -119,10 +118,8 @@ void qmp_block_set_write_threshold(const char *node_name,
          return;
      }
- aio_context = bdrv_get_aio_context(bs);
-    aio_context_acquire(aio_context);
-
+    /* Avoid a concurrent write_threshold_disable.  */
+    bdrv_drained_begin(bs);

Is there documentation that says it's safe to call
bdrv_drained_begin(bs) without the AioContext acquired? AIO_WAIT_WHILE()
contradicts this:

   The caller's thread must be the IOThread that owns @ctx or the main loop
   thread (with @ctx acquired exactly once).
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


Well, I cannot remember why this patch was correct at the time I was
working on them.  Patches 7/8 on the other hand were okay because back
then bdrv_reopen_multiple() called bdrv_drain_all_begin().

Overall, what survives of this series is patches 5 and 6, plus Vladimir's
take on the write threshold code.

Anyway, things have gotten _more_ broken in the meanwhile, and this is
probably what causes the deadlocks that Emanuele has seen with the
remainder of the branch.  Since this patch:

    commit aa1361d54aac43094b98024b8b6c804eb6e41661
    Author: Kevin Wolf <kwolf@redhat.com>
    Date:   Fri Aug 17 18:54:18 2018 +0200

    block: Add missing locking in bdrv_co_drain_bh_cb()
bdrv_do_drained_begin/end() assume that they are called with the
    AioContext lock of bs held. If we call drain functions from a coroutine
    with the AioContext lock held, we yield and schedule a BH to move out of
    coroutine context. This means that the lock for the home context of the
    coroutine is released and must be re-acquired in the bottom half.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
    Reviewed-by: Max Reitz <mreitz@redhat.com>

AIO_WAIT_WHILE  is going down a path that does not do the release/acquire of
the AioContext, which can and will cause deadlocks when the main thread
tries to acquire the AioContext and the I/O thread is in bdrv_co_drain.

The message that introduced it does not help very much
(https://mail.gnu.org/archive/html/qemu-devel/2018-09/msg01687.html)
but I think this must be reverted.

The next steps however should have less problems with bitrot, in particular
the snapshots have changed very little.  Block job code is very different
but it is all in the main thread.

Paolo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]