[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[RFC v2] VFIO Migration

From: Stefan Hajnoczi
Subject: [RFC v2] VFIO Migration
Date: Thu, 5 Nov 2020 15:09:02 +0000

v2 (big change, please reread everything):
* Replace URIs with Go-style <domain>/<path> strings
* Replace configuration parameters with migration parameters. The semantics are
  different; they only describe migration compatibility and do not capture all
  device configuration. This makes it easier to explain the purpose of
  parameters and also logically separates device instantiation from migration
  compatibility checking.
* Describe how to achieve subsection semantics
* Add hint that device internal state should be as general as possible to allow
  different device implementations
* Drop versions, they added complexity and aren't necessary for the migration
  compatibility check
* Add first draft VFIO/mdev sysfs attr interface

VFIO Migration
This document describes how to ensure migration compatibility for VFIO devices,
including VFIO/mdev and vfio-user devices.

VFIO devices can save and load a *device state*. Saving a device state produces
a snapshot of a VFIO device's state that can be loaded again at a later point
in time to resume the device from the snapshot.

The process of saving a device state and loading it later is called
*migration*. The device state may be loaded by the same device instance that
saved it or by a new instance, possibly running on a different machine.

A VFIO/mdev driver together with the physical device provides the functionality
of a device. Alternatively, a vfio-user device emulation program can provide
the functionality of a device. These are called *device implementations*.

The device implementation where a migration originates is called the *source*
and the device implementation that a migration targets is called the

This document describes how to establish whether or not migration compatibility
exists between the source and destination. When compatibility has been
established, the probability of migrating successfully is high and a successful
migration does not leave the device inoperable due to silent migration

Migration Parameters
*Migration parameters* are used to describe characteristics that must match
between source and destination to achieve migration compatibility.

The first implementation of a simple device may not require migration
parameters if the source and destination are always compatible. As the device
evolves, the source and destination may differ and migration parameters are
required to express this variation. More complex devices may require migration
parameters from the start due to optional functionality that is not guaranteed
to be present in both source and destination.

A migration parameter consists of a name and a value. The name is a UTF-8
string that does not contain equals ('='), backslash ('/'), or whitespace
characters. The value is a UTF-8 string that does not contain whitespace

The meaning of the migration parameter and its possible values are specific to
the device, but values are generally based on one of the following types:
* Boolean (on/off)
* Integers (0, 1, 2, ...)
* Enumerations (red, green, blue, ...)
* Character strings

Migration parameters are conventionally formatted as <name>=<value> strings.
Examples include my-feature=on and num-queues=4.

The absence of a migration parameter must have the same effect as before the
migration parameter was introduced. For example, if my-feature=on|off is added
to control the availability of a new device feature, then my-feature=off is
equivalent to omitting the migration parameter.

Hardware Interface Compatibility
VFIO devices have a *hardware interface* consisting of device regions and
interrupts. Aspects of the hardware interface can vary between device
implementations and require migration parameters to express migration
compatibility requirements.

Examples of migration parameters include:
* Feature availability - feature bitmasks, hardware revision numbers, etc. If
  the destination may lack support for optional features or hardware interface
  revisions, then migration parameters are required.
* Functionality - hardware register blocks that are only present on certain
  device instances. If there are multiple devices sub-models that have
  different hardware interfaces then migration parameters are required.

These examples demonstrate aspects of the hardware interface that must not
change unexpectedly. Were they to differ between source and destination, the
chance of device driver malfunction would be high because the layout of the
hardware interface would change or assumptions the device driver makes about
available functionality would be violated. Migration parameters are used to
preserve the hardware interface across migration and explicitly represent
variations between device implementations.

Hardware interfaces sometimes support reporting an event when a change occurs.
In those cases it may be possible to support visible changes in the hardware
interface across migration. In most other cases migration must not result in a
visible change in the hardware interface.

Migration parameters are not necessary for read-only values exposed through the
hardware interface, such as MAC address EEPROMs or serial numbers, so long as
all device implementations can be configured with the same range of input
values for these read-only values. This is possible because migration
parameters do not capture the full configuration of the device, only aspects
that affect migration compatibility.

Device configuration that is not visible through the hardware interface, such
as a host file system path of a disk image file or the physical network port
assigned to a network card, usually does not require migration parameters
because those values are not visible through the hardware interface and can be
changed without breaking migration compatibility.

The disk image file may indirectly affect the hardware interface, for example
by constraining the device's block size. In this case a block-size=N migration
parameter is required to ensure migration compatibility, but the host file
system path of the disk image file still does not require a migration

Device State Representation
Device state contains both data accessible through the device's hardware
interface and device-internal state needed to restore device operation.

The contents of hardware registers are usually included in the device state if
they can change at runtime. Hardware registers with constant or computed data
may not need to be part of the device state provided that device
implementations can produce the necessary data.

Device-internal state includes the portion of the device's state that cannot be
reconstructed from the hardware interface alone. Defining device-internal state
in the most general way instead of exposing device implementation details
allows for flexibility in the future. For example, device implementations often
maintain a ring index, which is not available through the hardware interface,
to keep track of which ring elements have already been consumed. The ring index
must be included in the device state so that the destination can resume
processing from the correct point in the ring. Representing this as an index
into the ring in the hardware interface is more general than adding device
implementation-specific request tracking data structures into the device state.

The *device state representation* defines the binary data layout of the device
state. The device state representation is specific to each device and is beyond
the scope of this document, but aspects pertaining to migration compatibility
are discussed here.

Each change to the device state representation that affects migration
compatibility requires a migration parameter. When a new field is added to the
device state representation then a new migration parameter must be added to
reflect this change. Often a single migration parameter expresses both a change
to the hardware interface and the device state representation. It is also
possible to change the device state representation without changing the
hardware interface, for example when some state was forgotten while designing
the previous device state representation.

The device state representation may support extra data that can be safely
ignored by old device implementations. In this case migration compatibility is
unaffected and a migration parameter is not required to indicate such extra
data has been added.

Device Models
The combination of the hardware interface, device state representation, and
migration parameter definitions is called a *device model*. Device models are
identified by a unique UTF-8 string starting with a domain name and followed by
path components separated with backslashes ('/'). Examples include
vendor-a.com/my-nic, gitlab.com/user/my-device, virtio-spec.org/pci/virtio-net,
and qemu.org/pci/10ec/8139.

The unique device model string is not changed as the device evolves. Instead,
migration parameters are added to express variations in a device.

The device model is not tied to a specific device implementation. The same
device model could be implemented as a VFIO/dev driver or as a vfio-user device
emulation program.

Multiple device implementations can support the same device model. Doing so
means that the device implementations can offer migration compatiblity because
they support the same hardware interface, device state representation, and
migration parameters.

Multiple device models can exist for the same hardware interface, each with a
different device state representation and migration parameters. This makes it
possible to fork and independently develop device models.

Device models can evolve over time as the hardware interface and device state
representation change. The corresponding migration parameters ensure that
migration compatibility can be established between device implementations.

Orchestrating Migrations
The following steps must be followed to migrate devices:

1. Check that the source and destination support the same device model.

2. Check that the destination supports the migration parameter list from the

3. Configure the destination so it is prepared to load the device state. This
   may involve instantiating a new device instance or resetting an existing
   device instance to a configuration that is compatible with the source.

   The migration parameter list may be used as part of this configuration, but
   note that not all of the configuration is captured in the migration
   parameter list. For example, the physical network port for a network card or
   the host file system path for a disk image file is typically not captured in
   the migration parameters and must be provided through other means.

4. Save the device state on the source and load it on the destination.

5. If migration succeeds then the destination resumes operation and the source
   must not resume operation. If the migration fails then the source resumes
   operation and the destination must not resume operation.

Note that these steps impose a conservative bound on device states that can be
migrated successfully. Not all configuration parameters may be strictly
required to match on the source and destination devices. For example, if the
device's hardware interface has not yet been initialized then changes to the
advertised features may not yet affect the device driver. However, accurately
representing runtime constraints is complex and risks introducing migration
bugs, so no attempt is made to support them.

VFIO/mdev Devices
TODO this is a first draft, more thought needed around enumerating supported
parameters, representing default values, etc

The following mdev type sysfs attrs are available for managing device

      create - writing a UUID to this file instantiates a device
      migration/ - migration related files
          model - unique device model string, e.g. vendor-a.com/my-nic

Device models supported by an mdev driver can be enumerated by reading the
migration/model attr for each <type-id>.

The following mdev device sysfs attrs relate to a specific device instance:

      mdev_type/ - symlink to mdev type sysfs attrs, e.g. to fetch 
      migration/ - migration related files
          applied - Write "1" to apply current migration parameter values or
                    "0" to reset migration parameter values to their defaults.
                    Parameters can only be applied or reset while the mdev is
                    not opened.
          params/ - migration parameters
              <my-param> - read/write migration parameter "my-param"

When the device is created the migration/applied attr is "0". Migration
parameters are accessible in migration/params/ and read 0 bytes because they
are at their default values.  At the point opening the mdev device will fail
because migration parameters must be applied first. Migration parameters can be
set to the desired values or left at their defaults. "1" must be written to
migration/applied before opening the mdev device.

If writing to a migration/params/<param> attr or setting migration/applied to
"1" fails, then the device implementation does not support the migration

An open mdev device typically does not allow migration parameters to be changed
at runtime. However, certain migration/params attrs may allow writes at
runtime. Usually these migration parameters only affect the device state
representation and not the hardware interface. This makes it possible to
upgrade or downgrade the device state representation at runtime so that
migration is possible to newer or older device implementations.

An existing mdev device instance can be reused by closing the mdev device and
writing "0" to migration/applied. This resets parameters to their defaults so
that a new list of migration parameters can be applied.

The migration parameter list for an mdev device that is in operation can be
read from migration/params/. Parameters that read 0 bytes are at their default

vfio-user Devices
TODO use FUSE to mimic VFIO/mdev sysfs (probably can't due to security
concerns, use UNIX domain socket RPC instead)?

Attachment: signature.asc
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]