qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Qemu-block] [PATCH for-2.9-rc5 v2] block: Drain BH in


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [Qemu-block] [PATCH for-2.9-rc5 v2] block: Drain BH in bdrv_drained_begin
Date: Sat, 15 Apr 2017 01:10:02 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0


On 14/04/2017 16:51, Stefan Hajnoczi wrote:
> On Fri, Apr 14, 2017 at 9:02 AM, Fam Zheng <address@hidden> wrote:
>> @@ -398,11 +399,15 @@ void bdrv_drain_all(void);
>>           */                                                \
>>          assert(!bs_->wakeup);                              \
>>          bs_->wakeup = true;                                \
>> -        while ((cond)) {                                   \
>> -            aio_context_release(ctx_);                     \
>> -            aio_poll(qemu_get_aio_context(), true);        \
>> -            aio_context_acquire(ctx_);                     \
>> -            waited_ = true;                                \
>> +        while (busy_) {                                    \
>> +            if ((cond)) {                                  \
>> +                waited_ = busy_ = true;                    \
>> +                aio_context_release(ctx_);                 \
>> +                aio_poll(qemu_get_aio_context(), true);    \
>> +                aio_context_acquire(ctx_);                 \
>> +            } else {                                       \
>> +                busy_ = aio_poll(ctx_, false);             \
>> +            }                                              \
> 
> Wait, I'm confused.  The current thread is not in the BDS AioContext.
> We're not allowed to call aio_poll(ctx_, false).

It's pretty ugly indeed.  Strictly from a thread-safety point of view,
everything that aio_poll calls will acquire the AioContext, so that is
safe and in fact the release/acquire pair can beeven  hoisted outside
the "if".

If we did that for blocking=true in both I/O and main thread, then that
would be racy.  This is the scenario mentioned in the commit message for
c9d1a56, "block: only call aio_poll on the current thread's AioContext",
2016-10-28).

If only one thread has blocking=true, it's subject to races too.  In
this case, the I/O thread may fail to be woken by iothread_stop's
aio_notify.  However, by the time iothread_stop is called there should
be no BlockDriverStates (and thus no BDRV_POLL_WHILE running the above
code) for the I/O thread's AioContext.

The root cause of the bug is simple: due to the main thread having the
I/O thread's AioContext, the main thread is not letting the I/O thread
run.  This is what RFifoLock and the contention callbacks were designed
to fix, though they had other issues related to recursive locking (if
you acquired an AioContext twice, the contention callback failed to
release it).  The right way to fix it is just not to hold _any_ lock
across BDRV_POLL_WHILE.  This means getting rid of
aio_context_acquire/release, and being somewhat careful about
bdrv_drained_begin, but overall it's not hard.

But Fam's patch seems ugly but safe, making it the right thing to do for
the 2.9 branch---possibly with a FIXME comment explaining the above.
bdrv_coroutine_enter can be removed too once BDS can do fine-grained
locking.

So, the ramifications of the current partial conversion are pretty
complex (though many older QEMU releases had other similar
dataplane+blockjob bugs due to the above recursive locking issue), but
it looks like it is more or less manageable to fix them either now or in
a quick 2.9.1 release a few weeks after 2.9.0.  (Fam and I also tried
another way to fix it today, but it led to deadlocks not related at all
to the partial conversion, so it seemed like a dead end).

Regarding the other code path mentioned in the cover letter:

>> 
>>     qmp_block_commit
>>       commit_active_start
>>         bdrv_reopen
>>           bdrv_reopen_multiple
>>             bdrv_reopen_prepare
>>               bdrv_flush
>>                 aio_poll
>>                   aio_bh_poll
>>                     aio_bh_call
>>                       block_job_defer_to_main_loop_bh
>>                         stream_complete
>>                           bdrv_reopen

it's possible that it's just a missing bdrv_ref/unref pair, respectively
in bdrv_reopen_queue_child and bdrv_reopen_multiple, so not related to
this patch.  We didn't test adding the ref/unref though.

Thanks,

Paolo

> Did you mean aio_poll(qemu_get_aio_context(), false) in order to
> process defer to main loop BHs?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]