qemu-stable
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-stable] [PATCH] vl: allow "cont" from panicked state


From: Michael S. Tsirkin
Subject: Re: [Qemu-stable] [PATCH] vl: allow "cont" from panicked state
Date: Wed, 21 Aug 2013 18:44:53 +0300

On Wed, Aug 21, 2013 at 05:32:27PM +0200, Paolo Bonzini wrote:
> Il 21/08/2013 17:23, Eric Blake ha scritto:
> >> Upon learning of a panic, management (if configured to do so) can pick a
> >> variety of behaviors: leave the VM paused, reset it, destroy it.  In
> >> addition to all of these behaviors, it is possible dumping the VM core
> >> from the host.
> > 
> > s/possible dumping/possible to dump/
> > 
> > and yes, libvirt wants to do just that, as one of its <on_crash>
> > mappings, since it could do the same for Xen.
> > 
> >>
> >> However, right now, the panicked state is irreversible, and can only be
> >> exited by resetting the machine.  This means that any policy decision
> >> is entirely in the hands of the host.  In particular there is no way to
> >> use the "reboot on panic" option together with pvpanic.
> >>
> >> This patch makes the panicked state reversible (and removes various
> >> workarounds that were there because of the state being irreversible).
> >> With this change, management has a wider set of possible policies: it
> >> can just log the crash and leave policy to the guest, it can leave the
> >> VM paused.  In particular, the "log the crash and continue" is implemented
> >> simply by sending a "cont" as soon as management learns about the panic.
> >> Management could also implement the "irreversible paused state" itself.
> >> And again, all such actions can be coupled with dumping the VM core.
> > 
> > Yes, this makes sense.
> > 
> >>
> >> Unfortunately we cannot change the behavior of 1.6.0.  Thus, even if
> >> it uses "-device pvpanic", management should check for "cont" failures.
> >> If "cont" fails, management can then log that the VM remained paused
> >> and urge the administrator to update QEMU.
> > 
> > Is that the best we can do?  Is there any sort of QMP introspection that
> > libvirt can do, where we can know UP FRONT what level of panic support
> > is provided by the qemu binary and the machine type being run in that
> > binary?
> 
> No, this is not possible unfortunately.  The only possibility that comes
> to mind would be to rename the pvpanic device, e.g. to "isa-pvpanic",
> and forget about "-device pvpanic" on 1.6.x.  A hack, I know.
> 
> To support 1.5, libvirt should simply be ready to react to unanticipated
> GUEST_PANICKED events.  reboot-on-panic will simply be broken for 1.5
> and Linux 3.10+ guests. :(

Let's just fix the bugs in 1.6.X.
I don't think libvirt needs to work around all qemu bugs.

For 1.5.X it might be possible to backport -device pvpanic there.
We need to make sure cross-version migration works.

> >> +++ b/vl.c
> >> @@ -637,9 +637,8 @@ static const RunStateTransition 
> >> runstate_transitions_def[] = {
> >>      { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
> >>      { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
> >>  
> >> -    { RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED },
> >> +    { RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING },
> > 
> > Is 'cont' the only viable way to escape PANICKED, or is it also
> > reasonable to support 'stop' as a way to transition from PANICKED to
> > PAUSED?  That is, management may want to make the state reversible but
> > still leave the guest paused, so this patch may be incomplete.
> 
> No, there is no way to move from PANICKED to PAUSED.  Libvirt has its
> own statuses (PAUSED, CRASHED etc.) and substatuses.  You don't really
> care about the QEMU state: both the PAUSED_PANICKED and CRASHED_PANICKED
> substatuses map to QEMU's GUEST_PANICKED state.  Simply, libvirt will
> not allow a "virsh resume" for <on_crash>preserve</on_crash>, and will
> allow it for a hypothetical new <on_crash>pause</on_crash> element.
> 
> BTW, any chance "coredump-destroy" and "coredump-restart" can be
> preserved just for backwards compatibility, and a new coredump='yes/no'
> attribute introduced instead?  Because coredump-pause and
> coredump-preserve would make just as much sense.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]