qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VFIO Migration


From: Stefan Hajnoczi
Subject: Re: VFIO Migration
Date: Tue, 3 Nov 2020 18:16:10 +0000

On Tue, Nov 03, 2020 at 03:23:03PM +0000, Daniel P. Berrangé wrote:
> On Tue, Nov 03, 2020 at 03:05:08PM +0000, Stefan Hajnoczi wrote:
> > On Tue, Nov 03, 2020 at 11:39:29AM +0000, Daniel P. Berrangé wrote:
> > > On Mon, Nov 02, 2020 at 11:11:53AM +0000, Stefan Hajnoczi wrote:
> > > > Overview
> > > > --------
> > > > The purpose of device states is to save the device at a point in time 
> > > > and then
> > > > restore the device back to the saved state later. This is more 
> > > > challenging than
> > > > it first appears.
> > > > 
> > > > The process of saving a device state and loading it later is called
> > > > *migration*. The state may be loaded by the same device that saved it 
> > > > or by a
> > > > new instance of the device, possibly running on a different computer.
> > > > 
> > > > It must be possible to migrate to a newer implementation of the device
> > > > as well as to an older implementation of the device. This allows users
> > > > to upgrade and roll back their systems.
> > > > 
> > > > Migration can fail if loading the device state is not possible. It 
> > > > should fail
> > > > early with a clear error message. It must not appear to complete but 
> > > > leave the
> > > > device inoperable due to a migration problem.
> > > 
> > > I think there needs to be an addition requirement.
> > > 
> > >  It must be possible for a management application to query the supported
> > >  versions, independantly of execution of a migration  operation.
> > > 
> > > This is important to large scale data center / cloud management 
> > > applications
> > > because before initiating a migration they need to *automatically* select
> > > a target host with high level of confidence that is will be compatible 
> > > with
> > > the source host.
> > > 
> > > Today QEMU migration compatibility is largely determined by the machine
> > > type version. Apps can query the supported machine types for host to
> > > check whether it is compatible. Similarly they will query CPU model
> > > features to check compatiblity.
> > > 
> > > Validation and error checking at time of migration is of course still
> > > required, but the goal should be that an mgmt application will *NEVER*
> > > hit these errors because they will have pre-selected a host that is
> > > known to be compatible based on reported versions that are supported.
> > 
> > Okay. What do you think of the following?
> > 
> >   [
> >     {
> >       "model": "https://qemu.org/devices/e1000e";,
> >       "params": [
> >         "rss",
> >     ...more configuration parameters...
> >       ],
> >       "versions": [
> >         {
> >       "name": "1",
> >       "params": [],
> >     },
> >     {
> >       "name": "2",
> >       "params": ["rss=on"],
> >     },
> >     ...more versions...
> >       ]
> >     },
> >     ...more device models...
> >   ]
> > 
> > The management tool can generate the configuration parameter list by
> > expanding a version into its params.
> > 
> > Configuration parameter types and input ranges need more thought. For
> > example, version 1 of the device might not have rx-table-size (it's
> > effectively 0). Version 2 introduces rx-table-size and sets it to 32.
> > Version 3 raises the value to 64. In addition, the user can set a custom
> > value like rx-table-size=48. I haven't defined the rules for this yet,
> > but it's clear there needs to be a way to extend configuration
> > parameters.
> > 
> > To check migration compatibility:
> > 1. Verify that the device model URL matches the JSON data[n].model
> >    field.
> > 2. For every configuration parameter name from the source device,
> >    check that it is contained within the JSON data[n].params list.
> 
> I'm not convinced that this makes sense. A matching set of parameter
> names + values does not imply that the migration data stream is
> actually compatible.
> 
> ie implementations may need to change the internal migration data
> stream to fix bugs, without adding/removing a config parameter.
> The migration version string alone expresses data stream compatibility.

This is not the approach described in this document. The point of this
approach is precisely that migration is known to be safe when the device
model URI and configuration parameters match on source and destination.

Changes to the guest-visible hardware interface and/or device state
representation always require a new configuration parameter under this
approach.

> This is similar to how 2 QEMU command lines can have identical set
> of configuration parameters, aside from the machine type version,
> and thus be migration *incompatible.

That is not possible under this approach.

> Basically the version string should be considered an opaque blob
> that expresses compatibility on its own.

The version string is not directly part of the migration compatibility
check under this approach. It's is simply an alias for a list of
configuration parameters.

Stefan

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]