[Qemu-devel] Migration issues and possible solutions (Very long)

qemu-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] Migration issues and possible solutions (Very long)

From:	Juan Quintela
Subject:	[Qemu-devel] Migration issues and possible solutions (Very long)
Date:	Fri, 27 Nov 2009 13:12:56 +0100
Introduction
------------

Following Dor Laor mail thread:
  Live migration protocol, device features, ABIs and other beasts

Several of us discussed the problem and possible solutions.  This mail
is a summary of the thread and discussions.  I am the one that summarized
the discussion, but there were lots of participants, I tried to attribute
the good ideas to its authors.

BIAS
----

I like the idea of having Several section + whitelists and select what is the
version of the device at start time.  I tried to not be biased in the rest of
the document, but this way you have been warned of what my bias is.

Problems with current migration
-------------------------------

Issue 1: Change of migration format inside a stable release
-------------------------------------------------------------

Qemu savevm format allow migrating for an old release to a new release.
i.e. you can migrate a machine from qemu-0.11 to qemu-0.12 if the devices
versions are compatible.  What is not supported in general is migrating from
qemu-0.12 into qemu-0.11.

But inside a stable release, we are supposed to be able to migrate back and 
forth

i.e. from qemu-0.11.0 <-> qemu-0.11.1

What we have found is what happens if we are in the stable release and we
found a bug in the savevm format?  Concrete example that happened to us
is that the value of two msr's were not saved in the "cpu" state.

What to do here? We can:
- get a new savevm format, and break the assumption that inside a stable
  branch you can migrate back and forth.
- tell that the savevm format for a stable release is carved in stone
  and if it has a bug, bad luck.

Now think that you have a cluster of machines, and that upgrading all of them
at the same time is not an option.

Notice that "both" of the solutions are bad.  And we don't think that this
is the last time that we have a bug in the savevm format inside a stable
release.

This is not academic problem, we are having this problem in RHEL with
time drift and pvclock.

Issue 2: Reverting to older vmstate version for compatibility
-------------------------------------------------------------

In previous case, we would like to qemu-0.11.1 to be able to migrate
to qemu-0.11.0 (i.e. have some way to disable the saving of the msr's,
the new fields).

The problem here is that qemu current refuses to load newer savevm sections
because it doesn't know how to interpret them.  And qemu can only migrate a
device using the latest version it knows about.

Issue 3: Selecting appropriate vmstate version for machine type
--------------------------------------------------------------

When launching qemu-0.12 -M pc-0.11, it's desirable to be able to live migrate
to a qemu 0.11.0 version.  Due to Issue 2, it is not possible.  This is a bug
that need to be fixed in upstream qemu.

Issue 4: Limitations of linear versioning
-----------------------------------------

Linear versioning has problem when you have to fix things in the stable branch.

device "foo" has v3 in 0.11
device "foo" has v4 in 0.12

Now a problem is found in 0.11 that requires a change in the savevm
format.  What version can we use for new "foo" device format?  if we
use v3, migration to old 0.11 will fail.  If we use v4, migration to
0.12 will fail.

This also happens all the time for kvm.  kvm needs a new field for a
device, it adds the field and increases the version number.  But then
at some point qemu upstream adds another field and also increases the
version number.  We have a conflict, and there is no way to express
this kind of relations with linear versioning.

This is not only a problem of downstream (kvm, Red Hat, Novell, ...),
it also happens for qemu stable branches.

Proposals for #1, #2, #3
------------------------

Thinking about ways of trying to solve/mitigate this problem, there are
several suggestions (not sorted in any particular order):

- Dor email solution:
  Control *every* feature exposed to the guest by qemu command line.  Obviously
  this is very flexible, but it has a cost of adding all the knobs and test
  that they work.

- Anthony/Juan solution:
  (I don't remember who proposed it 1st, but we agreed in lot of
  points).  We have already a mechanism that does part of this:

     qemu-rhel5.4.1 -M rhel5.4

  This should launch qemu of 5.4.1 with a machine type of rhel5.4.
  But (and this is a big but) current qemu with that command line launch
  a machine with the devices of rhel5.4, but it uses the savevm protocol
  of 5.4.1.

  Ok, then we define that this is a bug that we have to fix.  Once this
  is fixed, you can fix the issue 2.

- Variant of previous solution (Michael Tsirkin).

  Do not merge machine description and savevm formats.  Use a monitor command
  to specify in what version we want the savevm format to be.  This has
  one advantage, you can save to different versions at any point.
  The reasoning form this is that in machine description should only go things
  that are guest visible.  And savevm format is not guest visible.

  note from Juan: using the machine type has the advantage that at creation
  device type we know what version we are emulating.  We can change more things
  that the savevm format.

- Protocol negotiation
  (Dor, gleb, mst and eduardo at least defend part of this solution).
  Idea is to get source/target look for common versions that will work
  together.

  You can fix this problem in a completely different way.  When you
  migrate you just negotiate between source and target what versions to
  use for each device.

  The possibility that I suggested (but there are more) was:
    source send all devices with versions ranges that it support
    target make a choice of the highest version numbers of each device
    that it support, and answer that to source (if there exist such a
    valid list).

  Gleb proposed that source just sent all possible formats, and target
  select the one that it understand.

  Anthony don't like this one, he things that this should not be part of the
  savevm protocol and that this should be done higher in the management
  stack.  Info devices should print savevm options and management should
  find the versions beforehand.

- Dor/mst proposal of optional features

  This came from previous discussions, Dor want to put optional fields
  in the savevm protocol that target can just discard.

  I am against this, because it makes the test cases exponential.

  mst always told one case, that is when the driver knows that it hasn't
  use a feature.  Example is msix.  A device can know that guest is not
  using msix on that device and just don't send the msix part of the
  information.  That way you can migrate back and forth between machines
  were the only differences are msix support in devices.

  Indeed when I don't like the optional features, I agree that this idea
  has some merit.  I think that the majority of the cases are not
  independent features like this one, but for features like this one it
  makes sense.

To summarize, at this point basically all proposals agree that we want a way to
select an old version of the savevm format.  But the _when_ and _where_ hasn't
been in agreement yet.

1) Regarding _when_ this setting may be defined:
   a) Defining the versions at startup
   b) Defining the versions at runtime
   c) some combination of the above
   d) some other option?

2) Regarding _where_ this is defined:
  a) machine-type
  b) qdev
  b) other config option created just for the savevm version
  c) monitor "set-machine-wide-savevm-version" command
  d) monitor "set-device-savevm-version" command
  e) some combination of the above
  f) some other option?

Issue 4: Limitations of linear versioning
-----------------------------------------

Possible solution:  Hierarchical version numbers (Anthony proposal)

It means changing the protocol to allow for two device versions, one for
qemu and another for downstream (kvm/xen/...).  This has some appeal, but also
has its problems.  What happens when a distro packages kvm, they need yet
another version number, i.e. they did modifications to a device that was also
modified in kvm.  And it also don't solve the problem with updates
(see problem 1).  A device in 0.11.0 is at version 7, and now for qemu 0.12.0 is
at version 8.  We found a bug at 0.11.0 and we need to change the wire format,
what version to use?

Another possible solution: Use feature (sub)sections (Avi suggestion)

Each time that we add a new set of fields to a device state, just create a new
subsection for this fields.  That makes easier cherry pick features for
back porting to stable series.  Make easy to create backward compatibility.
As we are very near 0.12 and we can't implement subsections proper, Avi 
suggestion
is to add new sections with name like:
  "device/feature/vendor"z
That is more descriptive that hierarchical versions, and gets us quite of
feasibility.

This has a problem though, and that is exponential testing cases. i.e. we have
device A, with features a, b,  we have the following possible combinations:

A
A+a
A+b
A+a+b

As you can guess, as the number of features grow, the number of test
cases grow very fast.  Solution for this problem is that not all
combinations are valid.  Only the combinations listed in the whitelist
by the driver as valid will be accepted (for instance, A, A+a, A+a+b).

Once that we have some kind of negotiation (be it at the savevm protocol level 
or
upper in the management stack level), and we have some way to set the
savevm version (again any of the alternatives), we will have a more flexible
migration between versions, and an easier way to maintain the stable branches.

A related problem is savevm/loadvm.  When we do a savevm, we don't know what
version would do the loadvm.  And that means that we can't do a negotiation.
One possible solution is to save all sections and the whitelist of valid
combinations at savevm time.  Now at loadvm time, we just search if there is
a valid combination in the whitelist, and use only that sections, otherwise
we fail the loadvm.


Sorry for the long thread, but this is describing a very long thread with
lots of problems/suggestions.

Later, Juan.
[Prev in Thread]
Current Thread
[Next in Thread]
[Qemu-devel] Migration issues and possible solutions (Very long), Juan Quintela <=
- Re: [Qemu-devel] Migration issues and possible solutions (Very long), Gerd Hoffmann, 2009/11/27
Prev by Date: [Qemu-devel] Re: Disable DEBUG again in usb-linux.c
Next by Date: [Qemu-devel] [PATCH 0/4] rerror option for -drive
Previous by thread: [Qemu-devel] ARM v4t support
Next by thread: Re: [Qemu-devel] Migration issues and possible solutions (Very long)
Index(es):
- Date
- Thread