qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Hanging VM: Fix block I/O hang patch could help?


From: Daniel Dickinson
Subject: [Qemu-devel] Hanging VM: Fix block I/O hang patch could help?
Date: Wed, 10 Dec 2008 00:51:31 -0500

Hello,
        
        I have been experiencing VM hangs for some time (anything after
0.8.2 on debian) with a Windows 98 SE guest (where I can can
consistently hang it by a certain part of the install procedure) as
well as various Ubuntu guests (where it will hang but is 'just random').

From Debian bug #474386, quoting myself

I have done the bisect and narrowed it down to three commits.  Two of
the commits do not compile so could be the problem and the third is the
first bad commit that does compile.  

The last good revision is 2072 (git
d62ca2669be6d6af2c0cbda47abd7e51548060bf)

The first known bad commit is 2075 (git
83f640910acd7cd13ff8a603f29c46033c4fb00)

I have attached the diff between these two revisions (three commits
worth).

The first bad commit is large.  It is the switch to asynchronous file
I/O for the disk images.  I believe the problem is likely a race
condition in that code which is unpleasant.  The main reason I wonder
about this is that if that is the problem I would expect random vm
hangs across the board (though possibly rarely) and not just for me.  I
experience hangs in ubuntu (two versions) as well as windows 98 so that
is consistent, ...

I have noticed in the user forums some indication that this is
happening to other people to.

I have also seen the messages series titled

Re: [Qemu-devel] [patch] Fix block I/O hang.

and was wondering if that could be the problem.  I include the last
message in that series below.

Re: [Qemu-devel] [patch] Fix block I/O hang.
From:   Gerd Hoffmann
Subject:        Re: [Qemu-devel] [patch] Fix block I/O hang.
Date:   Thu, 13 Nov 2008 10:14:31 +0100
User-agent:     Thunderbird 2.0.0.16 (X11/20080723)

Anthony Liguori wrote:
> Gerd Hoffmann wrote:
>>  
>>> Under what circumstances?  posix_aio_read() is only invoked from a
>>> select callback.  This means there should be data available to be read.
>>>     
>>
>> Well, there are *two* select loops:  main_loop_wait() and
>> qemu_aio_wait().  Calling sync block i/o functions from a i/o handler
>> causes the two select loops run nested => boom.
> 
> Yeah, qemu_aio_wait needs to die.  Can you resubmit your patch with a
> better description, and change the read() look in posix_aio_read() to
> consume as much data as possible before hitting EAGAIN?

I've fixed my problem by changing xen_disk to use a bottom half for
actual work, so the block read/write calls are moved out of the select
loop anyway.  Which turned out to be useful for aio support too.

So I'm fine again with the current state.  I can create such a patch
nevertheless though.

cheers,
  Gerd


If this isn't xen-specific I'd like to try a build with this patch to
see if it works.

Regards,

Daniel

-- 
And that's my crabbing done for the day.  Got it out of the way early, 
now I have the rest of the afternoon to sniff fragrant tea-roses or 
strangle cute bunnies or something.   -- Michael Devore
GnuPG Key Fingerprint 86 F5 81 A5 D4 2E 1F 1C      http://gnupg.org
The C Shore: http://www.wightman.ca/~cshore

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]