[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VFIO Migration

From: Stefan Hajnoczi
Subject: Re: VFIO Migration
Date: Tue, 3 Nov 2020 15:05:08 +0000

On Tue, Nov 03, 2020 at 11:39:29AM +0000, Daniel P. Berrangé wrote:
> On Mon, Nov 02, 2020 at 11:11:53AM +0000, Stefan Hajnoczi wrote:
> > Overview
> > --------
> > The purpose of device states is to save the device at a point in time and 
> > then
> > restore the device back to the saved state later. This is more challenging 
> > than
> > it first appears.
> > 
> > The process of saving a device state and loading it later is called
> > *migration*. The state may be loaded by the same device that saved it or by 
> > a
> > new instance of the device, possibly running on a different computer.
> > 
> > It must be possible to migrate to a newer implementation of the device
> > as well as to an older implementation of the device. This allows users
> > to upgrade and roll back their systems.
> > 
> > Migration can fail if loading the device state is not possible. It should 
> > fail
> > early with a clear error message. It must not appear to complete but leave 
> > the
> > device inoperable due to a migration problem.
> I think there needs to be an addition requirement.
>  It must be possible for a management application to query the supported
>  versions, independantly of execution of a migration  operation.
> This is important to large scale data center / cloud management applications
> because before initiating a migration they need to *automatically* select
> a target host with high level of confidence that is will be compatible with
> the source host.
> Today QEMU migration compatibility is largely determined by the machine
> type version. Apps can query the supported machine types for host to
> check whether it is compatible. Similarly they will query CPU model
> features to check compatiblity.
> Validation and error checking at time of migration is of course still
> required, but the goal should be that an mgmt application will *NEVER*
> hit these errors because they will have pre-selected a host that is
> known to be compatible based on reported versions that are supported.

Okay. What do you think of the following?

      "model": "https://qemu.org/devices/e1000e";,
      "params": [
        ...more configuration parameters...
      "versions": [
          "name": "1",
          "params": [],
          "name": "2",
          "params": ["rss=on"],
        ...more versions...
    ...more device models...

The management tool can generate the configuration parameter list by
expanding a version into its params.

Configuration parameter types and input ranges need more thought. For
example, version 1 of the device might not have rx-table-size (it's
effectively 0). Version 2 introduces rx-table-size and sets it to 32.
Version 3 raises the value to 64. In addition, the user can set a custom
value like rx-table-size=48. I haven't defined the rules for this yet,
but it's clear there needs to be a way to extend configuration

To check migration compatibility:
1. Verify that the device model URL matches the JSON data[n].model
2. For every configuration parameter name from the source device,
   check that it is contained within the JSON data[n].params list.

> > VFIO Implementation
> > -------------------
> > The following applies both to kernel VFIO/mdev drivers and vfio-user device
> > backends.
> > 
> > Devices are instantiated based on a version and/or configuration parameters:
> > * ``version=1`` - use the device configuration aliased by version 1
> > * ``version=2,rx-filter-size=64`` - use version 1 and override 
> > ``rx-filter-size``
> > * ``rx-filter-size=0`` - directly set configuration parameters without 
> > using a version
> > 
> > Device creation fails if the version and/or configuration parameters are not
> > supported.
> > 
> > There must be a mechanism to query the "latest" configuration for a device
> > model. It may simply report the ``version=5`` where 5 is the latest version 
> > but
> > it could also report all configuration parameters instead of using a version
> > alias.
> The mechanism needs to be able to report all supported versions strings,
> not simple the latest version string. I think we need to specify the
> actual mechanism todo this query too, because we can't end up in a place
> where there's a different approach to queries for each device type.

Makes sense.


Attachment: signature.asc
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]