qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2] qemu-ga: Add the guest-suspend command


From: Luiz Capitulino
Subject: Re: [Qemu-devel] [PATCH v2] qemu-ga: Add the guest-suspend command
Date: Wed, 14 Dec 2011 21:44:00 -0200

On Wed, 14 Dec 2011 12:06:13 -0600
Michael Roth <address@hidden> wrote:

> On 12/14/2011 10:38 AM, Luiz Capitulino wrote:
> > On Wed, 14 Dec 2011 09:54:29 -0600
> > Michael Roth<address@hidden>  wrote:
> >
> >> On 12/14/2011 07:00 AM, Luiz Capitulino wrote:
> >>> On Tue, 13 Dec 2011 14:03:08 -0600
> >>> Michael Roth<address@hidden>   wrote:
> >>>
> >>>> On 12/13/2011 12:28 PM, Luiz Capitulino wrote:
> >>>>> It supports two modes: "hibernate" (which corresponds to S4) and
> >>>>> "sleep" (which corresponds to S3). It will try to execute the
> >>>>> pm-hibernate or pm-suspend scripts, if the scripts don't exist
> >>>>> the command will try to suspend by directly writing to the
> >>>>> "/sys/power/state" file.
> >>>>>
> >>>>> An interesting implementation detail is how to cleanup the child's
> >>>>> status on termination, so that we don't create zombies. I've
> >>>>> choosen to ignore the SIGCHLD signal. This will cause the kernel to
> >>>>> automatically cleanup the child's status on its termination.
> >>>>
> >>>> One downside to blocking SIGCHLD is it can screw with child processes
> >>>> that utilize it. `sudo` for instance will just hang indefinitely because
> >>>> it handles it's own cleanup via SIGCHLD, we might run into similar cases
> >>>> with various pm-hibernate/pm-suspend implementations as well.
> >>>>
> >>>> This will also screw with anything we launch via guest-exec as well,
> >>>> once that goes in.
> >>>>
> >>>> I wonder if we can tie the qemu_add_child_watch() stuff into qemu-ga's
> >>>> main loop, or maybe just implement something similar...
> >>>>
> >>>> Basically:
> >>>>
> >>>>     - add a qemu-ga.c:signal_channel_add() that creates a non-blocking
> >>>> pipe and associate the read-side with a GIOChannel, then ties the
> >>>> channel into the main loop via g_io_add_watch() on qemu-ga startup, with
> >>>> an associated callback that reads everything off the pipe, then iterates
> >>>> through a list of registered pids and does a non-blocking wait(pid, ...)
> >>>> on each.
> >>>>
> >>>>     - add a SIGCHLD handler that writes a 1 to the write side of the pipe
> >>>> whenever it gets called
> >>>
> >>> Is the pipe really needed? Is there any side effect if we call waitpid()
> >>> from the signal handler considering it won't block?
> >>
> >> In the current state of things, I believe that might actually be
> >> sufficient. I was thinking of the eventual guest-exec case though: we
> >> need to be able to poll for a guest-exec'd process' return status
> >> asynchronously by calling a guest-exec-status command that does the
> >> wait(), so if we use a waitpid(-1, ...) SIGCHLD handler, or block
> >> SIGCHLD, the return status would be intercepted/lost.
> >
> > What I had in mind was to do what qemu_add_child_watch() does, ie. it
> > iterates through a list of child pids calling waitpid() on them instead
> > of doing waitpid(-1,...). The only difference is that we would do it from
> > a signal handler.
> 
> Ah, yah that might work. I'm not sure how well our list functions will 
> behave when accessed via an interrupt handler. We'll have race 
> conditions at least:
> 
> #define QTAILQ_INSERT_TAIL(head, elm, field) do {                      \
>          (elm)->field.tqe_next = NULL;                                  \
>          (elm)->field.tqe_prev = (head)->tqh_last;                      \
>          /* interrupt */
>          *(head)->tqh_last = (elm);                                     \
>          (head)->tqh_last = &(elm)->field.tqe_next;
> 
> So if we fork() and the process completes and triggers the interrupt 
> before we add the pid to the list, it will remain a zombie until 
> something else triggers the handler. It's not a huge deal...we at least 
> avoid accumulating zombies, but I'd be weary of other side-effects as 
> well, especially if we also remove entries from the list if they've been 
> waited on successfully.

We can block SIGCHLD temporarily while inserting. If a new signal arrives
while it's blocked, it will be queued and will be delivered when unblocked.

> > Maybe that could work for guest-exec too: we could let the signal handler
> > collect the exit status and store them. Then guest-status could retrieve
> > the status from there.
> 
> Yah, I think we can tie all this together rather nicely with a little work.
> 
> >
> > We have several options here. Maybe I should do the simplest solution for
> > the guest-suspend command and you work on improving it for guest-exec.
> 
> That's fine, I need to look at this more. But if we're gonna take the 
> simplest approach for now, let's not even bother with the pid 
> registration (and potential races), and just do a waitpid(-1, ...) in a 
> SIGCHLD handler like QEMU did before the child watch stuff was added, 
> and I'll make the changes necessary for the guest-exec stuff before that 
> gets added.

Okay, that works for me.

> 
> >
> >>>
> >>> I'm also wondering if we could use g_child_watch_add(), but it's not clear
> >>> to me if it works with processes not created with g_spawn_*() functions.
> >>
> >> GPid's map to something other than PIDs on Windows, so I think we'd have
> >> issues there. But our fork() approach probably wouldn't work at all on
> >> Windows except maybe under cygwin, so at some point we'd probably want
> >> to switch over to g_spawn for this kind of stuff anyway...
> >>
> >> So this might be a good point to switch over to using the glib functions.
> >>
> >> Would you mind trying to do the hibernate/zombie reaping stuff using
> >> g_spawn+g_child_watch_add()? It might end up being the easiest route.
> >> Otherwise I can take a look at it later today.
> >
> > Well, there are two problems with g_spawn wrt to the manual method of
> > writing to the sysfs file. The first one is that I'm not sure if g_spawn()
> > reports the file not found error synchronously. The other problem is that,
> 
> "error can be NULL to ignore errors, or non-NULL to report errors. If an 
> error is set, the function returns FALSE. Errors are reported even if 
> they occur in the child (for example if the executable in argv[0] is not 
> found). Typically the message field of returned errors should be 
> displayed to users. Possible errors are those from the G_SPAWN_ERROR 
> domain."
> 
> So I'd think we'd be able to report the FNF synchronously even if we 
> launch into the background.

Yeah, from my testing looks like it does.

> 
> > I'd have to fork() anyway to write to the sysfs file (unless we decide that
> > it's ok to do this synchronously, which seems ok to me).
> 
> I think what we'd have to do is open the file, then pass it in as stdout 
> for a g_spawn'd `echo "disk"` command. I'm not sure how we'd manage 
> cleaning up the FD though, if we're doing it asynchronously. And 
> synchronous means every guest-suspend call will eventually time out, so 
> I don't think that's doable right now.
> 
> So I'd say keep it the way it is for now. Probably not worth spending 
> too much time on until we see what the Windows stuff looks like anyway.

Ok.

> 
> >
> >>
> >>>
> >>>>
> >>>> Then, when creating a child, you just register the child by adding the
> >>>> pid to the list that the signal channel callback checks.
> >>>>
> >>>>>
> >>>>> Signed-off-by: Luiz Capitulino<address@hidden>
> >>>>> ---
> >>>>>
> >>>>> I've tested this w/o any virtio driver, as they don't support S4 yet. 
> >>>>> For
> >>>>> S4 it seems to work ok. I couldn't fully test S3 because we lack a way 
> >>>>> to
> >>>>> resume from it, but by checking the logs it seems to work fine.
> >>>>>
> >>>>> changelog
> >>>>> ---------
> >>>>>
> >>>>> v2
> >>>>>
> >>>>> o Rename the command to 'guest-suspend'
> >>>>> o Add 'mode' parameter
> >>>>> o Use pm-utils scripts
> >>>>> o Cleanup child termination status
> >>>>>
> >>>>>     qapi-schema-guest.json     |   17 +++++++++++
> >>>>>     qemu-ga.c                  |   11 +++++++-
> >>>>>     qga/guest-agent-commands.c |   64 
> >>>>> ++++++++++++++++++++++++++++++++++++++++++++
> >>>>>     3 files changed, 91 insertions(+), 1 deletions(-)
> >>>>>
> >>>>> diff --git a/qapi-schema-guest.json b/qapi-schema-guest.json
> >>>>> index 29989fe..656bde9 100644
> >>>>> --- a/qapi-schema-guest.json
> >>>>> +++ b/qapi-schema-guest.json
> >>>>> @@ -219,3 +219,20 @@
> >>>>>     ##
> >>>>>     { 'command': 'guest-fsfreeze-thaw',
> >>>>>       'returns': 'int' }
> >>>>> +
> >>>>> +##
> >>>>> +# @guest-suspend
> >>>>> +#
> >>>>> +# Suspend guest execution by entering ACPI power state S3 or S4.
> >>>>> +#
> >>>>> +# @mode: 'hibernate' RAM content is saved in the disk and the guest is
> >>>>> +#                    powered down (this corresponds to ACPI S4)
> >>>>> +#        'sleep'     execution is suspended but the RAM retains its 
> >>>>> contents
> >>>>> +#                    (this corresponds to ACPI S3)
> >>>>> +#
> >>>>> +# Notes: This is an asynchronous request. There's no guarantee it will
> >>>>> +# succeed. Errors will be logged to guest's syslog.
> >>>>> +#
> >>>>> +# Since: 1.1
> >>>>> +##
> >>>>> +{ 'command': 'guest-suspend', 'data': { 'mode': 'str' } }
> >>>>> diff --git a/qemu-ga.c b/qemu-ga.c
> >>>>> index 60d4972..b32e96c 100644
> >>>>> --- a/qemu-ga.c
> >>>>> +++ b/qemu-ga.c
> >>>>> @@ -61,7 +61,7 @@ static void quit_handler(int sig)
> >>>>>
> >>>>>     static void register_signal_handlers(void)
> >>>>>     {
> >>>>> -    struct sigaction sigact;
> >>>>> +    struct sigaction sigact, sigact_chld;
> >>>>>         int ret;
> >>>>>
> >>>>>         memset(&sigact, 0, sizeof(struct sigaction));
> >>>>> @@ -76,6 +76,15 @@ static void register_signal_handlers(void)
> >>>>>         if (ret == -1) {
> >>>>>             g_error("error configuring signal handler: %s", 
> >>>>> strerror(errno));
> >>>>>         }
> >>>>> +
> >>>>> +    /* This should cause the kernel to automatically cleanup child
> >>>>> +       termination status */
> >>>>> +    memset(&sigact_chld, 0, sizeof(struct sigaction));
> >>>>> +    sigact_chld.sa_handler = SIG_IGN;
> >>>>> +    ret = sigaction(SIGCHLD,&sigact_chld, NULL);
> >>>>> +    if (ret == -1) {
> >>>>> +        g_error("error configuring signal handler: %s", 
> >>>>> strerror(errno));
> >>>>> +    }
> >>>>>     }
> >>>>>
> >>>>>     static void usage(const char *cmd)
> >>>>> diff --git a/qga/guest-agent-commands.c b/qga/guest-agent-commands.c
> >>>>> index a09c8ca..4799638 100644
> >>>>> --- a/qga/guest-agent-commands.c
> >>>>> +++ b/qga/guest-agent-commands.c
> >>>>> @@ -574,6 +574,70 @@ int64_t qmp_guest_fsfreeze_thaw(Error **err)
> >>>>>     }
> >>>>>     #endif
> >>>>>
> >>>>> +#define LINUX_PM_UTILS_PATH "/usr/sbin"
> >>>>> +#define LINUX_SYS_STATE_FILE "/sys/power/state"
> >>>>> +
> >>>>> +void qmp_guest_suspend(const char *mode, Error **err)
> >>>>> +{
> >>>>> +    int ret, fd = -1;
> >>>>> +    const char *pmutils_bin;
> >>>>> +    char pmutils_bin_path[PATH_MAX];
> >>>>> +
> >>>>> +    if (strcmp(mode, "hibernate") == 0) {
> >>>>> +        pmutils_bin = "pm-hibernate";
> >>>>> +    } else if (strcmp(mode, "sleep") == 0) {
> >>>>> +        pmutils_bin = "pm-suspend";
> >>>>> +    } else {
> >>>>> +        error_set(err, QERR_INVALID_PARAMETER, "mode");
> >>>>> +        return;
> >>>>> +    }
> >>>>> +
> >>>>> +    snprintf(pmutils_bin_path, sizeof(pmutils_bin_path), "%s/%s",
> >>>>> +             LINUX_PM_UTILS_PATH, pmutils_bin);
> >>>>> +
> >>>>> +    if (access(pmutils_bin_path, X_OK) != 0) {
> >>>>> +        pmutils_bin = NULL;
> >>>>> +        fd = open(LINUX_SYS_STATE_FILE, O_WRONLY);
> >>>>> +        if (fd<    0) {
> >>>>> +            error_set(err, QERR_OPEN_FILE_FAILED, 
> >>>>> LINUX_SYS_STATE_FILE);
> >>>>> +            return;
> >>>>> +        }
> >>>>> +    }
> >>>>> +
> >>>>> +    ret = fork();
> >>>>> +    if (ret == 0) {
> >>>>> +        /* child */
> >>>>> +        setsid();
> >>>>> +        fclose(stdin);
> >>>>> +        fclose(stdout);
> >>>>> +        fclose(stderr);
> >>>>> +
> >>>>> +        if (pmutils_bin) {
> >>>>> +            ret = execl(pmutils_bin_path, pmutils_bin, NULL);
> >>>>> +            if (ret) {
> >>>>> +                 slog("%s failed: %s", pmutils_bin_path, 
> >>>>> strerror(errno));
> >>>>> +             }
> >>>>> +        } else {
> >>>>> +            const char *cmd = strcmp(mode, "sleep") == 0 ? "mem" : 
> >>>>> "disk";
> >>>>> +            ret = write(fd, cmd, strlen(cmd));
> >>>>> +            if (ret<    0) {
> >>>>> +                slog("can't write to %s: %s\n", LINUX_SYS_STATE_FILE,
> >>>>> +                     strerror(errno));
> >>>>> +            }
> >>>>> +            close(fd);
> >>>>> +        }
> >>>>> +
> >>>>> +        exit(!!ret);
> >>>>> +    } else if (ret<    0) {
> >>>>> +        error_set(err, QERR_UNDEFINED_ERROR);
> >>>>> +        return;
> >>>>> +    }
> >>>>> +
> >>>>> +    if (!pmutils_bin) {
> >>>>> +        close(fd);
> >>>>> +    }
> >>>>> +}
> >>>>> +
> >>>>>     /* register init/cleanup routines for stateful command groups */
> >>>>>     void ga_command_state_init(GAState *s, GACommandState *cs)
> >>>>>     {
> >>>>
> >>>
> >>
> >
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]