qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and


From: Eduardo Habkost
Subject: Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
Date: Mon, 23 Nov 2009 15:08:58 -0200
User-agent: Sup/git

Excerpts from Anthony Liguori's message of Mon Nov 23 14:16:39 -0200 2009:
> Eduardo Habkost wrote:
> > Excerpts from Anthony Liguori's message of Mon Nov 23 12:49:23 -0200 2009:
<snip>
> >>>       
> >> In our own stable branch, we do not introduce any savevm changes.  I 
> >> would recommend the same policy for RHEL :-)
> >>     
> >
> > But what if you need to add a savevm change to make migration work
> > properly on the stable branch?
> 
> Define "properly".

It depends on many factors: user expectations, written specifications,
documentation. On the pvclock MSR case, it means guests OSes and users
expect the MSR values to be kept by the virtual machine, because that's
how pvclock is expected to work.


> 
> If we have to introduce a new version in VMstate, there are two 
> possibilities.  The first is that we have to backlist the old version 
> because it was fundamentally broken.  This is rare but it happens.  In 
> this case, we would not be able to support migrating from that stable 
> release to any other stable release.  Really unfortunate for users but 
> we would have no other choice.

Well, we may have an option (described below).


> 
> The second is that we introduce a new version but don't blacklist the 
> old.  This means the old version wasn't fundamentally broken.  It also 
> means that the "fix" is a feature.  It makes things better but isn't 
> strictly required.  That gets deferred to the next release.

Unfortunately sometimes you can't defer to the next release.


However, I think there is another possibility to handle the format
change: you can support migration from the old version to a newer
version, but the machine type (or maybe other internal field) is set to
tell that we are running a machine that has the old behavior (e.g. "this
is a machine that doesn't keep the pvclock MSR values").

Qemu version x.y.1 would support only the old machine type because it
doesn't have the new fix/feature. Qemu version x.y.2 would support both
machine types, because it has the fix but it will support migration from
x.y.1.

If you have a running guest and you want the pvclock (or other
guest-visible) behavior to change, you have to "move it to new virtual
hardware", meaning you should restart the guest using the new machine
type. Migrating guests would never change their machine type (or the
internal field used just for that), because you can't change the
definition of "guest visible state" of a running virtual machine.

(All above just to keep the ability of fixing bugs on guest-visible
behavior while keeping the ability to migrate between different
versions. I am not arguing it is worth all the work, but I am starting
to think it is the only sane solution if we want to keep both
abilities).


The above addresses one point where I think you are right: changing the
definition of "guest visible state" of a running VM isn't something
desirable. We may still disagree about the policy of a stable branch,
but I agree about not changing the savevm format of a running VM.



> 
> >  You can't just tell users "migration is
> > known to be broken on the stable branch, please don't run migrations
> > when using the stable branch". That's the case for the pvclock MSR
> > migration fix.
> >   
> 
> You're assuming that backporting the pvclock change is a bug fix.  It's 
> a new feature as far as I'm concerned and doesn't belong in stable.

It is a bug fix because the definition of "guest visible state" is
buggy. Guests expect pvclock state to be kept, and it was not being
kept.

If you consider that every savevm change is a feature, you are assuming
that the definition of "guest visible state" of the current
implementation is perfect and would never be considered buggy. I don't
think it is a reasonable assumption.



> 
> > In a perfect world, the set of state data that is migrated by the
> > current implementation would always match exactly the expected behavior
> > of the virtual machine. Unfortunately sometimes the implementation
> > doesn't follow the "contract" (be it some written specification,
> > documentation, or just user expectations).  When that happens, it is a
> > bug on qemu and it needs to be fixed on the stable branch.
> >
> > Note that (right now) I am not arguing for backward migration, but just
> > arguing that we can't have a strict "no savevm changes" policy on the
> > stable branch.
> >   
> 
> That's exactly what I'm advocating: a strict savevm policy for stable 
> branch.  It's something I've always enforced in the past.  It's 
> necessary to preserve the integrity of live migration.

That may be good enough for upstream Qemu, but IMO for RHEL it is not a
realistic policy. If the definition of "guest visible state" is buggy on
the current implementation, we can't drop entirely the possibility of
fixing it on our stable branch.

-- 
Eduardo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]