qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 0/2]: qemu-ga: drop automatic reaper


From: Michal Privoznik
Subject: Re: [Qemu-devel] [RFC 0/2]: qemu-ga: drop automatic reaper
Date: Mon, 23 Apr 2012 14:20:53 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120330 Thunderbird/11.0.1

On 20.04.2012 15:36, Luiz Capitulino wrote:
> On Fri, 20 Apr 2012 14:07:16 +0200
> Michal Privoznik <address@hidden> wrote:
> 
>>>> But, I think if we tell users we'll *only* send response on error,
>>>> we should do our part to *not* send the responses, rather than relying
>>>> on them having implemented the reset mechanism to throw them away after
>>>> guest wake-up. What we could do is allow a command to set a flag, after
>>>> it reaps it's child (in the case of suspend this would be after
>>>> wake-up, for shutdown it'd basically be a no-op, but worth adding
>>>> for readability sake), to have qemu-ga not send a response. We'd
>>>> implement it similarly to how we did ga_set_response_delimited().
>>>
>>> Fine with me. Stating that "do not wait for an OK response, because none
>>> will be sent" sounds clearer than "an OK response may, or may not be
>>> emitted. Or it may be emitted when the VM resumes".
>>
>>
>> Just to make this clear: this "report-only-error" behavior concerns only
>> guest-suspend-* and guest-shutdown commands, right? Because otherwise,
>> if we enable such behavior for all commands (e.g. fsfreeze) I think we
>> are entering the world of pain.
> 
> Exactly, this would only concern the suspend and shutdown commands.
> 
>> From user POV there is a huge difference between those 2 sets of
>> commands (suspend/shutdown on one side, the others on the other side):
>> - the first emits an event on qemu monitor, so libvirt can catch that
>> and confirm suspend/shutdown has succeeded
> 
> Oops, this is a different subject but there's a problem here. Events are just
> hints, they shouldn't be your definitive source of information.
> 
> For shutdown and suspend-disk, I think that the best indication that
> the command has succeeded is that the VM will successfully exit. We could
> also have a special exit status code for suspend-to-disk, because the
> command could run in parallel with the user powering off the VM and libvirt
> wouldn't know that (and would think the VM is suspended, while it's really
> powered-off).
> 
> For suspend-ram and suspend-hybrid, we're missing a 'suspended' RunState.
> The event serve as a good hint and you can use it, but if it's lost for some
> reason (eg. libvirt crashes before it's received) then libvirt can learn the
> VM state by issuing query-status.
> 
> Now, going back to the original subject. I have to admit that I'm not sure
> what's the best way to go here.
> 
> I'll try to recapitulate (for myself and for those that may be confused) I'll 
> be
> verbose a bit.
> 
> We have two qemu-ga commands that are special: guest-shutdown and the 
> guest-suspend
> family. They are special because they shut down the VM or suspend its 
> execution
> (meaning that the world of qemu-ga is gone or gets completely frozen).
> 
> Today, shutdown is an asynchronous operation: qemu-ga gets the shutdown 
> process
> started and returns to accept new commands. For qemu-ga clients, this implies:
> 
>  1. errors in the shutdown operation are not reported back
>  2. qemu-ga doesn't block
> 
> For qemu-ga this implies having a way to automatically reap terminated 
> children
> processes.
> 
> The guest-suspend commands do the same when executing the suspend operation,
> but before they do that they need to query the VM for suspend support and
> this is done by executing pm-is-supported (if available). This fact shouldn't
> be visible to qemu-ga clients, but it has two internal implications:
> 
>  1. The operation is half synchronous and half asynchronous
>  2. In order to bypass the automatic process reaper in qemu-ga when executing
>     pm-is-supported, we have to play tricks that makes the suspend code
>     more complex than it should be
> 
> We have two options:
> 
>  1. Keep the current behavior (explained above, shutdown is async, suspend
>     is half sync half async). For libvirt this means nothing changes, for
>     qemu-ga this means more complex code
> 
>  2. Change everything to be synchronous (this series). This essentially means:
> 
>      A. errors are going to reported back
>      B. qemu-ga will block
>      C. we avoid all the dirty tricks, and qemu-ga code becomes simpler
>      D. In theory, this should be a compatible change due to the end of world
>         nature of the commands involved
> 
> A third possible option would be to have asynchronous support. But I'm not
> sure whether this would fit well and how complex this would be (specially
> because of fork()).
> 

Okay, thanks for recap.
One thing that I am sure will not play nicely is old libvirt with new
qemu-ga. Here's the flow:

1. User issues virDomainPMSuspend*()
2. Libvirt chews this and calls guest-suspend-* holding up return from
API until an qemu-ga answer is read
3. Guest gets suspended
... Eventually ...
4. Guest gets resumed and qemu-ga returns "{'return':{}}"
5. Libvirt reads response and returns from API

So, I think if we are going to change these commands, we need to spawn a
new ones so user can distinguish if he's talking to fixed or broken
qemu-ga. Or, we can introduce query-* commands family like we already
have for the monitor, so user can guess it from qemu-ga version;

Having said how libvirt use qemu-suspend-*; what should be the new
libvirt behavior? I mean - how long should libvirt wait until it can
return from API? IOW - after what period of time/what event user can be
sure no error will be returned? Because as Luiz said above - STOP event
is just hint, not confirmation. I guess API should wait until qemu
process dies. This may, however, take ages (esp. in case of
guest-shutdown).

Michal



reply via email to

[Prev in Thread] Current Thread [Next in Thread]