[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC v3] VFIO Migration

From: Stefan Hajnoczi
Subject: Re: [RFC v3] VFIO Migration
Date: Wed, 11 Nov 2020 15:10:14 +0000

On Tue, Nov 10, 2020 at 01:14:04PM -0700, Alex Williamson wrote:
> On Tue, 10 Nov 2020 09:53:49 +0000
> Stefan Hajnoczi <stefanha@redhat.com> wrote:
> Documentation/filesystems/sysfs.rst:
> ---
> Attributes
> ~~~~~~~~~~
> Attributes can be exported for kobjects in the form of regular files in
> the filesystem. Sysfs forwards file I/O operations to methods defined
> for the attributes, providing a means to read and write kernel
> attributes.
> Attributes should be ASCII text files, preferably with only one value
> per file. It is noted that it may not be efficient to contain only one
> value per file, so it is socially acceptable to express an array of
> values of the same type.
> Mixing types, expressing multiple lines of data, and doing fancy
> formatting of data is heavily frowned upon. Doing these things may get
> you publicly humiliated and your code rewritten without notice.
> ---
> We'd either need to address your TODO and create a hierarchical
> representation or find another means to exchange this format.

Okay, thanks for pointing this out. If the limitations on sysfs
directory structure are really what I think they are, then we can work
around the lack of sub-directories by flattening the hierarchical
information in an attribute name prefix, but it's ugly:


It makes enumerating migration parameters more awkward for userspace
because they need to skip many of the files when scanning for parameter

Or we could create a kobject for each migration parameter, but that
seems wrong too.

Or we could investigate other file systems like configfs. Maybe this is
why tracefs and other specific file systems exist - sysfs is too

> > Device models supported by an mdev driver and their details can be read from
> > the migration_info.json attr. Each mdev type supports one device model. If a
> > parent device supports multiple device models then each device model has an
> > mdev type. There may be multiple mdev types for a single device model when 
> > they
> > offer different migration parameters such as resource capacity or feature
> > availability.
> > 
> > For example, a graphics card that supports 4 GB and 8 GB device instances 
> > would
> > provide gfx-4GB and gfx-8GB mdev types with memory=4096 and memory=8192
> > migration parameters, respectively.
> I think this example could be expanded for clarity.  I think this is
> suggesting we have mdev_types of gfx-4GB and gfx-8GB, which each
> implement some common device model, ie. com.gfx/GPU, where the
> migration parameter 'memory' for each defaults to a value matching the
> type name.  But it seems like this can also lead to some combinatorial
> challenges for management tools if these parameters are writable.  For
> example, should a management tool create a gfx-4GB device and change to
> memory parameter to 8192 or a gfx-8GB device with the default parameter?

Right, if gfx-4GB and gfx-8GB both offer variable "memory" migration
parameters. Userspace will eliminate mdevs whose device model string and
allowed parameter values are incompatible, and then it will choose a
remaining mdev type. If creating the device fails then it can try
another remaining mdev type.

> > The following mdev device sysfs attrs relate to a specific device instance::
> > 
> >   /sys/.../<parent-device>/<uuid>/
> >     mdev_type/ - symlink to mdev type sysfs attrs, e.g. to fetch 
> > migration/model
> We need a mechanism that translates to non-mdev vfio devices as well,
> the device "model" creates a clean separation from an mdev-type, we
> shouldn't reintroduce that dependency here.

Okay. The user will need the device model string and the migration
parameter info.

Is there an example of a non-mdev VFIO device that has software
functionality (e.g. device-specific sysfs attrs)?

> >     migration/ - migration related files
> >       <param> - read/write migration parameter "param"
> >       ...
> > 
> > When the device is created all migration/<param> attrs take their
> > migration_info.json "init_value".
> > 
> > When preparing for migration on the source, each migration parameter from
> > migration/<param> is read and added to the migration parameter list if its
> > value differs from "off_value" in migration_info.json. If a migration 
> > parameter
> > in the list is not available on the destination, then migration is not
> > possible. If a migration parameter value is not in the destination
> > "allowed_values" migration_info.json then migration is not possible.
> > 
> > In order to prepare an mdev device instance for an incoming migration on the
> > destination, the "off_value" from migration_info.json is written to each
> > migration parameter in migration/<param>. Then the migration parameter list
> > from the source is written to migration/<param> one migration parameter at a
> > time. If an error occurs while writing a migration parameter on the 
> > destination
> > then migration is not possible. Once the migration parameter list has been
> > written the mdev can be opened and migration can proceed.
> What's the logic behind setting the value twice?  If we have a
> preconfigured pool of devices where the off_value might use less
> resources, we risk that resources might be consumed elsewhere if we
> release them and try to get them back.  It also seems rather
> inefficient.

The description above was sub-optimal. Each parameter only needs to be
written once:

  for param in dest_params:
      if param in source_params:
          val = source_params[param]
          val = param_json['off_value']

      sysfs_write(f'migration/{param}', val)

We either write the value from the source or the off_value from the

> > An open mdev device typically does not allow migration parameters to be 
> > changed
> > at runtime. However, certain migration/params attrs may allow writes at
> > runtime. Usually these migration parameters only affect the device state
> > representation and not the hardware interface. This makes it possible to
> > upgrade or downgrade the device state representation at runtime so that
> > migration is possible to newer or older device implementations.
> Which begs the question of how we'd determine which can be modified
> runtime...  Thanks,

Deciding to modify a parameter at runtime requires knowledge of what
that parameter does. (Unlike the migration compatibility algorithm,
which blindly processes all migration parameters.)

Therefore, I'm not sure it's necessary to add metadata for this. The
user must know what they are doing when modifying parameters at runtime.
If the device implementation doesn't support modifying the parameter at
runtime then -EBUSY can be returned from write(2).


Attachment: signature.asc
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]