qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] thread-pool.c race condition?


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] thread-pool.c race condition?
Date: Thu, 2 Apr 2015 17:47:18 +0100

On Thu, Apr 2, 2015 at 5:43 PM, Paolo Bonzini <address@hidden> wrote:
> On 02/04/2015 18:26, Stefan Hajnoczi wrote:
>> John Snow has reported that qemu-io can hang when the host is under
>> heavy load.  He made the following observations in gdb:
>>
>> 1. The program is sitting in aio_poll() (called by bdrv_prwv_co())
>> waiting for request completion.
>>
>> 2. The thread pool has a ThreadPoolElement with ->state == THREAD_DONE.
>>
>> The ThreadPoolElement should have been reaped by
>> thread_pool_completion_bh() and its callback invoked.  For some reason
>> this didn't happen and the program is blocked in poll(2) waiting.
>>
>> This suggests a race condition in thread-pool.c or qemu_bh_schedule()
>> (used to complete ThreadPoolElement from a QEMU event loop).
>>
>> I don't have a good theory why this happens yet.  Just wanted to share
>> in case someone else hits this problem.
>
> Laszlo hit something very similar fairly easily with virtio-scsi (but
> not virtio-blk!) on aarch64 hosts.  Any attempt to debug it (ranging
> from compilation with -O0 to tracing) made it disappear.  A reliable
> reproducer with qemu-io would be a dream...

My initial speculation was that the qemu_bh_schedule():

if (bh->scheduled)
    return;

Check is causing us to skip BH invocations.

When I look at the code the lack of explicit barriers or atomic
operations for bh->scheduled itself is a little suspicious.

But now I'm focussing more on thread-pool.c since that has its own
threading constraints.

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]