Re: [Qemu-devel] Live migration protocol, device features, ABIs and othe

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Live migration protocol, device features, ABIs and othe

From:	Anthony Liguori
Subject:	Re: [Qemu-devel] Live migration protocol, device features, ABIs and other beasts
Date:	Sun, 22 Nov 2009 09:49:26 -0600
User-agent:	Thunderbird 2.0.0.23 (X11/20090825)

Dor Laor wrote:

In the last couple of days we discovered some issues regarding stableABI and the robustness of the live migration protocol. Let's just jumpright into it, ordered by complexity:


1. Control *every* feature exposed to the guest by qemu cmdline:

   While thinking on cross version migration, and reviewing some
   patches, I noticed that there are many times that we use feature bits
   in order to expose functionality for the guest driver - example:
   VIRTIO_BLK_F_BARRIER, but we do not control it from qemu cmdline.

   The result is that guest running on a newer qemu cannot live migrate
   into older qemu without the barrier feature.

   Like this barrier example, there are probably many cases that we
   do keep device/driver abi but forget new/old release abi.

   The solution here is simpler - Every guest visible change should
   translate into cmdline option. This is part of the machine type and
   in addition should be configurable.
   It's an issue we all should keep in the back of our heads and popup
   when a new capability/change are introduced.

s/cmdline/qdev/g and I agree with you. There's nothing protocolspecific about this though.

2. Live migration inherent problem.

   Currently, even with VMState, the protocol is not flexible enough.
   We run into problem when we needed to fix pvclock migration issue.
   The fix included 2 additional fields in save/load state and thus
   needed a new version number.
   The trouble is that the load function does not accept sections with
   versions greater than the one it supports.

This is a feature, not a bug. You cannot migrate from an newer qemu toan older one. There's simply no way to support this in a sane way.

   We cannot even create a new 'hack section' for new code since the
   sections are ordered and expected to be exact match on the
   destination.

   The result is that new->old migration cannot work. This is not cross
   releases even! It means that even a small bug in current release
   prevents live migration between various instances of the code.
   It forces us to decide whether to fix pvclock migration issue vs
   allow new->old migration. Another ugly hack is to add cmdline that
   will control this behavior. Still it's a pain to mgmt stack and
   users.


This is a pretty normal policy (backwards compat but not forwards compat).

   The solution here is more complex. One can claim that we should allow
   newer sections to be accepted by current code (and send the section
   size) and send optional sections. This would be a nasty work around.

   IMHO we should 'specify' the migration protocol and introduce
   capabilities, feature bits, etc. This way we'll have a robust,
   extensible protocol that will withstand any potential issue. Both
   Michael Tisrkin and I suggest it at the time vmstate was introduced.
   Vmstate is good for the code but it's not a protocol.

I don't see how this fixes anything. If you used feature bits, how doyou migrate from a version that has a feature bit that an older versiondoesn't know about? Do you just ignore it?

Migration needs to be conservative. There should be only two possibleoutcomes: 1) a successful live migration or 2) graceful failure with thesource VM still running correctly. Silently ignoring things that couldaffect the guests behavior means that it's possible that after failure,the guest will fail in an unexpected way.

Which protocol should we use? You're smarter than me, please suggest
one.
wrt the above guest abi issue, we should write a qemu spec with cleardefinitions for devices, drivers, versions, etc.

I don't think there's a problem with what we have now. The only thing Ithink we should add is a vendor sub-versioning mechanism.Unfortunately, we have downstreams that make lots of changes. Today,since we have a single version space, there is inevitable versioningclash because of the shared namespace. If we had a sub-versioningmechanism, it provides a way for downstreams to backport features andchange the device models in such a way that the versioning doesn't clashwith upstream.

It also provides a way to determine if two downstreams are compatiblewith each other which is a pretty neat concept.


This could be done as a small, incremental change to the current protocol.

Regards,

Anthony Liguori

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] Live migration protocol, device features, ABIs and other beasts, Dor Laor, 2009/11/22
- Re: [Qemu-devel] Live migration protocol, device features, ABIs and other beasts, Anthony Liguori <=
  - [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts, Paolo Bonzini, 2009/11/22
    - Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts, Anthony Liguori, 2009/11/22
    - [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts, Paolo Bonzini, 2009/11/23
    - Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts, Anthony Liguori, 2009/11/23
    - Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts, Gleb Natapov, 2009/11/23
    - Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts, Paolo Bonzini, 2009/11/23
    - Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts, Gleb Natapov, 2009/11/23
    - Message not available
    - [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts, Gleb Natapov, 2009/11/23
    - Message not available
    - [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts, Gleb Natapov, 2009/11/23
    - Message not available
    - [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts, Gleb Natapov, 2009/11/23

Prev by Date: [Qemu-devel] Re: what's on qemu tree for pci related fixes
Next by Date: Re: [Qemu-devel] Re: [RFC v0 00/15] QEMU Monitor Protocol
Previous by thread: [Qemu-devel] Live migration protocol, device features, ABIs and other beasts
Next by thread: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
Index(es):
- Date
- Thread