qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain()


From: Paolo Bonzini
Subject: Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain()
Date: Thu, 13 Apr 2017 13:45:55 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0


On 13/04/2017 09:11, Jeff Cody wrote:
>> It didn't make it into 2.9-rc4 because of limited time. :(
>>
>> Looks like there is no -rc5, we'll have to document this as a known issue.
>> Users should "block-job-complete/cancel" as soon as possible to avoid such a
>> hang.
>
> I'd argue for including a fix for 2.9, since this is both a regression, and
> a hard lock without possible recovery short of restarting the QEMU process.

It is a bit of a corner case (and jobs on I/O thread are relatively rare
too), so maybe it's not worth delaying 2.9.  It has been delayed already
quite a bit.  Another reason I think I prefer to wait is to ensure that
we have an entry in qemu-iotests to avoid the future regression.

Fam explained to me what happens, and the root cause is that bdrv_drain
never does a release/acquire pair in this case, so the I/O thread run
remains stuck in a callback that tries to acquire.  Ironically
reintroducing RFifoLock would probably fix this (not 100% sure).  Oops.

His solution is a bit hacky, but we will hopefully be able to revert it
in 2.10 or whenever aio_context_acquire/release will go away.

Thanks,

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]