qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC v3] VFIO Migration


From: Stefan Hajnoczi
Subject: Re: [RFC v3] VFIO Migration
Date: Wed, 11 Nov 2020 15:34:38 +0000

On Wed, Nov 11, 2020 at 12:56:26PM +0000, Dr. David Alan Gilbert wrote:
> * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > Orchestrating Migrations
> > ------------------------
> > In order to migrate a device a *migration parameter list* must first be 
> > built
> > on the source. Each migration parameter is added to the list if it is in
> > effect. For example, the migration parameter list for a device with
> > new-feature=off,num-queues=4 would be num-queues=4 if the new-feature 
> > migration
> > parameter was introduced with the off value disabling its effect.
> 
> What component builds that list (i.e. what component needs to know the
> history that new-feature=off was the default - ah I think you answer
> that below).

Yep. Thanks for noting this. I'll need to reorder things so it is clear.

> > The following conditions must be met to establish migration compatibility:
> > 
> > 1. The source and destination device model strings match.
> > 
> > 2. Each migration parameter name from the migration parameter list is 
> > supported
> >    by the destination. For example, the destination supports the num-queues
> >    migration parameter.
> > 
> > 3. Each migration parameter value from the migration parameter list is
> >    supported by the destination. For example, the destination supports
> >    num-queues=4.
> 
> Hmm, are combinations of parameter checks needed - i.e. is it possible
> that a destination supports    num-queues=4 and  new-feature=on/off -
> but only supports new-feature=on when num-queues>2 ?

Yes, it's possible but cannot be expressed in the migration info JSON.

We need to choose a level of expressiveness that will be useful enough
without being complex. In the extreme the migration info would contain
Turing complete validation expressions (e.g. JavaScript) so that any
relationship can be expressed, but I doubt that complexity is needed.
The other extreme is just booleans and (opaque) strings for maximum
simplicity.

If the syntax is not expressive enough then it's impossible to check
migration compatibility without actually creating a new device instance
on the destination. Daniel Berrange raised the requirement of checking
migration compatibility without creating the device since this helps
with selecting a migration destination.

> > The migration compatibility check can be performed without initiating a
> > migration. Therefore, this process can be used to select the migration
> > destination.
> > 
> > The following steps perform the migration:
> > 
> > 1. Configure the destination so it is prepared to load the device state,
> >    including applying the migration parameter list. This may involve
> >    instantiating a new device instance or resetting an existing device 
> > instance
> >    to a configuration that is compatible with the source.
> > 
> >    The details of how to do this for VFIO/mdev drivers and vfio-user device
> >    backend programs is described below.
> > 
> > 2. Save the device state on the source and load it on the destination.
> 
> Which is true for almost everything, unles sit turned out to have
> significant amounts of RAM on board;  do we have a way to deal with that
> for vfio/vhost-user - where it needs to be iterative? (Lets just ignore
> this for now)

Step 2 includes iterative migration. I should have mentioned that in the
document.

> > "allowed_values"
> >   The list all values that the device implementation accepts for this 
> > migration
> >   parameter. Integer ranges can be described using "<min>-<max>" strings.
> > 
> >   Examples: ['a', 'b', 'c'], [1, 5, 7], ['0-255', 512, '1024-2048'], [true]
> > 
> >   This member is optional. When absent, any value suitable for the type may 
> > be
> >   given but the device implementation may refuse certain values.
> 
> JSON isn't a great choice for specifying ranges of integers

Agreed :)

> > The device is instantiated by launching the destination process with the
> > migration parameter list from the source:
> > 
> > .. code:: bash
> > 
> >   $ my-device --m-<param1>=<value1> --m-<param2> <value2> [...]
> > 
> > This example shows how to instantiate the device with migration parameters
> > ``param1`` and ``param2``. Both ``--m-<param>=<value>`` and ``--m-<param>
> > <value>`` option formats are accepted.
> > 
> > The ``--m-`` prefix is used to allow the device emulation program to 
> > implement
> > device implementation-specific command-line options without conflicting with
> > the migration parameter namespace.
> 
> That feels like an odd syntax to me.

Unfortunately we cannot use --<param>. I also considered using a JSON
input file but that makes it harder to invoke the device emulation
program manually for testing/development. I bet I'd have to look up the
JSON syntax every time whereas it's easy to remember how to format a
command-line parameter.

The other one I considered was using '--' or another marker to separate
device implementation-specific command-line arguments from migration
parameters. However, doing so places requirements on the device
emulation program's command-line parsing library and I think people will
be unhappy if their favorite Go, Rust, Python, etc library cannot handle
the command-line options due to our weird syntax.

Any ideas for a better syntax?

> > When preparing for migration on the source, each migration parameter from 
> > the
> > migration info JSON is added to the migration parameter list if its value
> > differs from "off_value". If a migration parameter in the list is not 
> > available
> > on the destination, then migration is not possible. If a migration parameter
> > value is not in the destination "allowed_values" migration_info.json then
> > migration is not possible.
> > 
> > On the destination, a command-line is generated from the migration parameter
> > list. For each destination migration parameter missing from the migration
> > parameter list a command-line option is added with the destination 
> > "off_value".
> > The device emulation program prints an error message to standard error and
> > terminates with exit status 1 if the device could not be instantiated.
> 
> I still don't think this revision answers the question of how a VM
> management program picks a sane set of parameter values for a new VM
> it's creating, especially if it wants it to be migratable.  That's
> something your version stuff in V1 seemed nice for.

Good point. If we're creating a VM and expect to migrate between two
device implementations, how do we choose the migration parameters?

I can see a solution for that: grab the set of "init_values" from both
device implementations and use the one that both accept. This is O(N^2)
so it's not great when there are many device implementations involved.
It's O(N) with version numbers because you can keep an intersection set
of supported version numbers.

This point definitely needs to be included in the document. Is my answer
acceptable or do you think versions are really needed?

It's also hard to answer "which of these two migration parameter lists
is better/more modern?" without versions when non-bool migration
parameters are involved.

Stefan

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]