qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] qemu-ga: possible race while suspending the guest


From: Luiz Capitulino
Subject: Re: [Qemu-devel] qemu-ga: possible race while suspending the guest
Date: Tue, 10 Apr 2012 16:32:43 -0300

On Tue, 10 Apr 2012 16:27:45 -0300
Luiz Capitulino <address@hidden> wrote:

> On Tue, 10 Apr 2012 13:50:28 -0500
> Michael Roth <address@hidden> wrote:
> 
> > On Tue, Apr 10, 2012 at 01:45:15PM -0300, Luiz Capitulino wrote:
> > > On Mon, 9 Apr 2012 17:29:51 -0500
> > > Michael Roth <address@hidden> wrote:
> > > 
> > > > On Mon, Apr 09, 2012 at 04:57:48PM -0300, Luiz Capitulino wrote:
> > > > > Hi Michael,
> > > > > 
> > > > > There's a possible race condition in the bios_supports_mode() 
> > > > > function used
> > > > > by all suspend commands in qemu-ga: if the parent process receives a 
> > > > > SIGCHLD
> > > > > while executing:
> > > > > 
> > > > >     close(pipefds[1]);
> > > > >     g_free(pmutils_path);
> > > > > 
> > > > > Or:
> > > > > 
> > > > >     ret = read(pipefds[0], &status, sizeof(status));
> > > > > 
> > > > > Then bad things can happen, like reporting that the guest doesn't 
> > > > > support
> > > > > suspend or a resource leak (a segfault may be possible too).
> > > > > 
> > > > > The easy & obvious way to fix this is just to block SIGCLHD while in 
> > > > > close()
> > > > > and g_free() and to loop read() on EINTR.
> > > > > 
> > > > > However, the real problem here is that the double fork schema used in 
> > > > > the
> > > > > suspend commands got complex. It's this way because I was trying to 
> > > > > avoid
> > > > > causing a conflict with your series that adds support for running 
> > > > > commands
> > > > > in the guest.
> > > > > 
> > > > > The ideal solution is to add an internal API to qemu-ga to create & 
> > > > > wait for
> > > > > child processes, like we discussed during the suspend commands review.
> > > > > 
> > > > > Now, what it's best way to go with this bug? Should I fix it the easy 
> > > > > way
> > > > > or should I wait for the new API (which I believe you're going to 
> > > > > work on)?
> > > > 
> > > > Was initially planning on using qemu_add_child_watch(), but this may end
> > > > up not being workable for w32 so I'm prefer to hold off on trying to
> > > > come up with a general solution till I start looking at that. I'm hoping
> > > > g_spawn* can save the day there but haven't looked at it too much 
> > > > detail.
> > > > 
> > > > In the meantime let's go with the "easy way". We should take care to
> > > > restore the default SIGCHLD in the 2nd child though to avoid causing
> > > > anything we exec() run to with SIGCHLD blocked though, since that may
> > > > hang us in certain situations.
> > > 
> > > Another option would be to drop the double fork. This would require us to
> > > remove the SIGCHLD handler and call waitpid() in the suspend functions. 
> > > This
> > > would simplify the code a lot and would make the bug go away, but any new
> > > command that executes new processes would have to change that.
> > 
> > Unfortunately shutdown relies on it too. It's not so much an issue
> > there given qga tends not to exist for much longer afterward, but
> > technically the command can fail and lead to accumulated zombies.
> 
> Right. I've considered shutdown too but thought that it would be so unlikely
> that it wouldn't matter. But maybe it's better to keep it complex but correct
> (instead of simple with the possibility of a bug).
> 
> Actually, it's not so unlikely: suspend wouldn't work correctly after a
> failed shutdown.

I'm wrong here, we obviously can wait for specific children. So the worse
case would be what you said above: zombies.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]