[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] qemu-ga: possible race while suspending the guest
From: |
Luiz Capitulino |
Subject: |
Re: [Qemu-devel] qemu-ga: possible race while suspending the guest |
Date: |
Tue, 10 Apr 2012 16:32:43 -0300 |
On Tue, 10 Apr 2012 16:27:45 -0300
Luiz Capitulino <address@hidden> wrote:
> On Tue, 10 Apr 2012 13:50:28 -0500
> Michael Roth <address@hidden> wrote:
>
> > On Tue, Apr 10, 2012 at 01:45:15PM -0300, Luiz Capitulino wrote:
> > > On Mon, 9 Apr 2012 17:29:51 -0500
> > > Michael Roth <address@hidden> wrote:
> > >
> > > > On Mon, Apr 09, 2012 at 04:57:48PM -0300, Luiz Capitulino wrote:
> > > > > Hi Michael,
> > > > >
> > > > > There's a possible race condition in the bios_supports_mode()
> > > > > function used
> > > > > by all suspend commands in qemu-ga: if the parent process receives a
> > > > > SIGCHLD
> > > > > while executing:
> > > > >
> > > > > close(pipefds[1]);
> > > > > g_free(pmutils_path);
> > > > >
> > > > > Or:
> > > > >
> > > > > ret = read(pipefds[0], &status, sizeof(status));
> > > > >
> > > > > Then bad things can happen, like reporting that the guest doesn't
> > > > > support
> > > > > suspend or a resource leak (a segfault may be possible too).
> > > > >
> > > > > The easy & obvious way to fix this is just to block SIGCLHD while in
> > > > > close()
> > > > > and g_free() and to loop read() on EINTR.
> > > > >
> > > > > However, the real problem here is that the double fork schema used in
> > > > > the
> > > > > suspend commands got complex. It's this way because I was trying to
> > > > > avoid
> > > > > causing a conflict with your series that adds support for running
> > > > > commands
> > > > > in the guest.
> > > > >
> > > > > The ideal solution is to add an internal API to qemu-ga to create &
> > > > > wait for
> > > > > child processes, like we discussed during the suspend commands review.
> > > > >
> > > > > Now, what it's best way to go with this bug? Should I fix it the easy
> > > > > way
> > > > > or should I wait for the new API (which I believe you're going to
> > > > > work on)?
> > > >
> > > > Was initially planning on using qemu_add_child_watch(), but this may end
> > > > up not being workable for w32 so I'm prefer to hold off on trying to
> > > > come up with a general solution till I start looking at that. I'm hoping
> > > > g_spawn* can save the day there but haven't looked at it too much
> > > > detail.
> > > >
> > > > In the meantime let's go with the "easy way". We should take care to
> > > > restore the default SIGCHLD in the 2nd child though to avoid causing
> > > > anything we exec() run to with SIGCHLD blocked though, since that may
> > > > hang us in certain situations.
> > >
> > > Another option would be to drop the double fork. This would require us to
> > > remove the SIGCHLD handler and call waitpid() in the suspend functions.
> > > This
> > > would simplify the code a lot and would make the bug go away, but any new
> > > command that executes new processes would have to change that.
> >
> > Unfortunately shutdown relies on it too. It's not so much an issue
> > there given qga tends not to exist for much longer afterward, but
> > technically the command can fail and lead to accumulated zombies.
>
> Right. I've considered shutdown too but thought that it would be so unlikely
> that it wouldn't matter. But maybe it's better to keep it complex but correct
> (instead of simple with the possibility of a bug).
>
> Actually, it's not so unlikely: suspend wouldn't work correctly after a
> failed shutdown.
I'm wrong here, we obviously can wait for specific children. So the worse
case would be what you said above: zombies.