[Qemu-devel] The State of the SaveVM format

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] The State of the SaveVM format

From:	Juan Quintela
Subject:	[Qemu-devel] The State of the SaveVM format
Date:	Wed, 09 Sep 2009 10:47:27 +0200

A Sad History of a Doomed Format
--------------------------------

When the world was old, there was a V1 of savevm format (nothing more
to tell about it).

Then appeared version 2. It was a very simple format.  Only fields were:

- instance_id
- version_id
- record_len

You can see it at savevm.c::qemu_loadvm_state_v2()

ToDo: Create an image with v2, and see if we can still read it,
      otherwise, remove support to load v2 format.

And then v3 appeared

    commit 9366f4186025e1d8fc3bebd41fb714521c170b6f
    Author: aliguori <address@hidden>
    Date:   Mon Oct 6 14:53:52 2008 +0000
    Introduce v3 of savevm protocol

Features:
     * Support for progressive save of sections (for live checkpoint/migration)
     * An asynchronous API for doing save
     * Support for interleaving multiple progressive save sections
       (for future support of memory hot-add/storage migration)
     * Fully streaming format
     * Strong section version checking

At this point, all the save/load of images were done in plain C with
functions that did anything that they wanted.  Life was nice and good
while things worked.  When they didn't worked, you only knew that they
didn't worked.  No info at all why.  Qemu SaveVM format was an opaque
thing  that only a corrected configured qemu is able to read.

Fast Forward to the present, and it appears VMState.  What does it?
It allows you to specify the state as a table, and then the save
function walks the table and save all the fields.  The load function
walks the table and loads all the fields.  Save and Load functions are
obviously always on sync, because they are done walking the same table.
And life was good .... Ooops, no, it was not good.

The problems is what to do from here:
- We can have a very simple VMState format that only allows storing
  simple types (int32_t, uint64_t, timers, buffers of uint8_t, ...)
  Arrays of valid types
  Structs of valid types
  And that is it.  Advantage of this approach, it is very simple to
  create/test/whatever.  Disadvantage: it can't express all the things
  that were done in plain C.  Everybody agrees that we don't want to
  support everything that was done in plain C in the old way.  What we
  are discussing is "how many" things do we want to support.  Notice
  that  we can support _everything_ that we were doing with plain C.
  Anytime that you want to do something strange, you just need to write
  your own marshaling functions and you are done.  You do there
  anything that you want.

  We are here at how we want to develop the format.  People that has
  expressed opinions so far are:
  - Gerd: You do a very simple format, and if the old state can't be
          expressed in simple VMState, you just use the old load
          function.  This maintains VMState clean, and you can load
          everything that was done before. Eventually, we remove the
          old load state functions when we don't support so old format.
  - Anthony: If we leave the old load state functions, they will be
          around forever.  He wants to complicate^Wimprove VMState
          to be able to express everything that was done in plain C.
          Reason: It is better to only have _one_ set of functions.
  - Paul?: I think he told that testing that we can load old state
          is impossible, and it is better to just remove the ability
          of load from old versions (I think this was Paul position, but
          discussion was a month ago, and my memory is not perfect)

I guess that if I am misinterpreting anyone;, they will let you know,
don't worry :) As you can see, what we are searching here is the less
bad solution, All have advantages and disadvantages, and none is
"perfect" or obviously better than the others.

ToDo: Port all devices (for instance of a typical pc) to current simple
      VMState and see how many things we are missing (Beware: Dragons in
      virtio)

Another day, another problem, this time called: Optional features.

How do we deal with optional features?
- We add feature bits (something like PCI does with optional features,
  the exact implementation is not important).  When we add an optional
  feature  to a driver, we just implement the save function as:
   - if we are using the feature, we add the feature bit indicating that
     we are using the feature, and we save the state for that feature.
   - at load time: If we find a feature that we don't understand, we
     just abort the load.
   - at load time: if you miss a feature that you need -> you also abort
  This has a nice advantage, if you load the state from old qemu, you
  don't use the new feature, and you save the state -> you can still
  load the state in old qemu (this is a nice theory, we don't know how
  it would work on practice).  Another advantage is that you can code
  and test each option separately. Michael S. Tsirkin likes this mode.

- The other position: Optional features? Such a thing don't exist :)
  Why?  Because if there are not optional features, you always know
  with only version + name of device if you support it or not (with
  optional features, you have another failure mode: you can find
  a feature that you don't understand in the middle of loading the state
  that can't happen if there is not optional features.

  But, we really, really want optional features (they throw msix support
  again).  No problem, you just create _another device:

   VMStateDescription vmstate_virtio-net = ...

   VMStateDescription vmstate_virtio-net_msix =
     VMSTATE_STRUCT(vmstate_net);
     .... msix bits

  You explicitly tells what optional features you want to use.  Notice
  that you can convince qdev to make the right thing:
    --device net,model=virtio,msix=on  (loads virtio-net-msix)
    --device net,model=virtio,msix=off  (loads plain virtio-net)

  Advantages, you only support the combinations that made sense, you
  explicitly state what they are, and VMState continues to be simple.
  Why don't use optional features?  Because then test matrix explodes
  exponentially, for each optional feature, you multiply by two the
  number of tests that you have to do.  Disadvantage is that obviously
  you end having more devices (although they can be implemented in the
  same file and share almost all the code, see how vga-pci and vga-isa
  share almost all the code).

  Not having optional features, have another interesting property.
  Versions of a device are linear in the sense that each new version is a
  superset of the previous one (i.e. the same fields than the previous one
  plus some more).  This makes support for loading of old versions way
  easier.  Here put Juan (i.e. me) and I think that in the past Gerd
  liked something like this.

To help make a decision here, it is a good idea to look at all the
devices and see if when they add more fields for a new version, they do
typically as:
- they add optional features
- they add them because now the simulation is better/whatever (they are
  _not_ optional)

Notice that again, both approaches have advantages and disadvantages, it
just depend of what your priorities are :)

More problems: Going from newer versions to old versions
- I think that everybody thinks that this is a nice to have, but that it
  will took a lot to make it work, and there are more urgent things to
  do.

Notice that there are plans for VMState to do more interesting things
like:
- Be able to show the values in a saved image
- See if a VM is able to load a vmstate (i.e. it has the needed devices
  at the needed versions)
- .....

That ones are independent of what we decided for the previous problems.

Comments?  Things that I missed for the discussion?

Later, Juan.

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] The State of the SaveVM format, Juan Quintela <=
- [Qemu-devel] Re: The State of the SaveVM format, Michael S. Tsirkin, 2009/09/09
  - [Qemu-devel] Re: The State of the SaveVM format, Juan Quintela, 2009/09/09
    - [Qemu-devel] Re: The State of the SaveVM format, Michael S. Tsirkin, 2009/09/09
- [Qemu-devel] Re: The State of the SaveVM format, Michael S. Tsirkin, 2009/09/09
  - [Qemu-devel] Re: The State of the SaveVM format, Juan Quintela, 2009/09/09
    - [Qemu-devel] Re: The State of the SaveVM format, Michael S. Tsirkin, 2009/09/09
- Re: [Qemu-devel] The State of the SaveVM format, Gerd Hoffmann, 2009/09/09
- Re: [Qemu-devel] The State of the SaveVM format, Anthony Liguori, 2009/09/09
  - [Qemu-devel] Re: The State of the SaveVM format, Juan Quintela, 2009/09/09
    - [Qemu-devel] Re: The State of the SaveVM format, Anthony Liguori, 2009/09/09

Prev by Date: Re: [Qemu-devel] i386-linux-user on ARM host
Next by Date: [Qemu-devel] Re: The State of the SaveVM format
Previous by thread: [Qemu-devel] PATCH: mcf_fec.c sends packet with incorrect length
Next by thread: [Qemu-devel] Re: The State of the SaveVM format
Index(es):
- Date
- Thread