qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC v3] VFIO Migration


From: Daniel P . Berrangé
Subject: Re: [RFC v3] VFIO Migration
Date: Mon, 16 Nov 2020 11:41:25 +0000
User-agent: Mutt/1.14.6 (2020-07-11)

On Mon, Nov 16, 2020 at 11:15:24AM +0000, Stefan Hajnoczi wrote:
> On Wed, Nov 11, 2020 at 03:48:50PM +0000, Daniel P. Berrangé wrote:
> > On Wed, Nov 11, 2020 at 02:36:15PM +0000, Stefan Hajnoczi wrote:
> > > On Tue, Nov 10, 2020 at 12:12:31PM +0100, Paolo Bonzini wrote:
> > > > On 10/11/20 10:53, Stefan Hajnoczi wrote:
> > > Yes, the current syntax supports sparse ranges and multiple ranges.
> > > 
> > > The trade-off is that a tool cannot validate inputs beforehand. You need
> > > to instantiate the device to see if it accepts your inputs. This is not
> > > great for management tools because they cannot select a destination
> > > device if they don't know which exact values are supported.
> > > 
> > > Daniel Berrange raised this requirement in a previous revision, so I
> > > wonder what his thoughts are?
> > 
> > In terms of validation I can't help but feel the whole proposal is
> > really very complicated.
> > 
> > In validating QEMU migration compatibility we merely compare the
> > versioned machine type.
> 
> Thinking more about this, maybe the big picture is:
> 
> Today the managment tool controls the variables in the migration (the
> device configuration). It has knowledge of the VMM, can set a machine
> type, apply a device configuration on top, and then migrate safely.
> 
> VFIO changes this model because VMMs and management tools do not have
> knowledge of specific device implementations. The device implementation
> is a new source of variables in the migration and the management tool no
> longer has the full picture.

This is not all that different from what we have today. eg QEMU exposes
several 100 devices impls, each with countless properties. Mgmt tools
like libvirt, or OpenStack/oVirt above don't support all these device
impls, nor do they support all the properties.

IOW, in many cases no configuration is exposed for many of the device
tunables, mgmt tools just rely on the machine type defaults for the
majority of them, and only do tuning for a relatively small subset.

So the machine type acts as a simplifying layer for the mgmt app,
enabling them to safely ignore majority of tunables, and only focus
on the small number of tunables they actually care about changing
or setting.

> I'm trying to define a standard interface for exposing migration
> compatibility information from device implementations to management
> tools, and a general algorithm that management tools can use without
> knowledge of specific device implementations.

For a given type of device I expect there would be some core set of
config parameters that would have to be common to any impl, plus
some set of config params that are specific to just one impl.

If the mgmt app only cares about the core set of config params, then
we should ensure that they can do migration compatibility checks without
needing to care about all the extra irrelevant config params.

If apps want to use some parameters that are custom to specific dev
impls, then they'll have to have logic to expose those params, and
also logic to validate them on migration - if they are frontend ABI
sensitive config parameters, as opposed to backend only.

> It is possible to simplify the problem, but we'll lose freedom. For
> example, hard coding knowledge of the device implementation into the
> management tool eliminates the need for a general migration checking
> algorithm. Or we might be able to simplify it by explicitly not
> supporting cross-device implementation migration (although that would
> place stricter rules on what a new version of an existing device can
> change in order to preserve migration compatibility).

Is migrating between 2 different vendors' impls of the same core
device spec really a thing that's needed ? 

> I have doubts that these trade-offs can be made without losing support
> for use cases that are necessary.

>From my POV, the key goal is that it should be possible to migrate
between two hosts without needing to check every single possible
config parameter that the device supports. It should only be neccessary
to check the parameters that are actually changed from their default
values. Then there just needs to be some simple string parameter that
encodes a particular set of devices, akin to the versioned machine
type.

Applications that want to migration between cross-vendor device impls
could opt-in to checking every single little parameter, but most can
just stick with a much simplified view where they only have to check
the parameters that they've actually overriden/exposed.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




reply via email to

[Prev in Thread] Current Thread [Next in Thread]