qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VFIO Migration


From: Dr. David Alan Gilbert
Subject: Re: VFIO Migration
Date: Wed, 4 Nov 2020 17:32:02 +0000
User-agent: Mutt/1.14.6 (2020-07-11)

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Wed, Nov 04, 2020 at 10:14:23AM +0000, Dr. David Alan Gilbert wrote:
> > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > On Tue, Nov 03, 2020 at 06:49:51PM +0000, Dr. David Alan Gilbert wrote:
> > > > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > > > On Tue, Nov 03, 2020 at 12:17:09PM +0000, Dr. David Alan Gilbert 
> > > > > wrote:
> > > > > > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > > > > > Device Models
> > > > > > > -------------
> > > > > > > Devices have a *hardware interface* consisting of hardware 
> > > > > > > registers,
> > > > > > > interrupts, and so on.
> > > > > > > 
> > > > > > > The hardware interface together with the device state 
> > > > > > > representation is called
> > > > > > > a *device model*. Device models can be assigned URIs such as
> > > > > > > https://qemu.org/devices/e1000e to uniquely identify them.
> > > > > > 
> > > > > > I think this is a unique identifier, not actually a URI; the 
> > > > > > https://
> > > > > > isn't needed since no one expects to ever connect to this.
> > > > > 
> > > > > Yes, it could be any unique string. If the URI idea is not popular we
> > > > > can use any similar scheme.
> > > > 
> > > > I'm OK with it being a URI; just drop the https.
> > > 
> > > Okay.
> > > 
> > > > > > > However, secondary aspects related to the physical port may 
> > > > > > > affect the device's
> > > > > > > hardware interface and need to be reflected in the device 
> > > > > > > configuration. The
> > > > > > > link speed may depend on the physical port and be reported 
> > > > > > > through the device's
> > > > > > > hardware interface. In that case a ``link-speed`` configuration 
> > > > > > > parameter is
> > > > > > > required to prevent unexpected changes to the link speed after 
> > > > > > > migration.
> > > > > > 
> > > > > > That's an interesting example; because depending on the device, it 
> > > > > > might
> > > > > > be:
> > > > > >     a) Completely virtualised so that the guest *shouldn't* know 
> > > > > > what
> > > > > > the physical link speed is, precisely to allow the physical network 
> > > > > > on
> > > > > > the destination to be different.
> > > > > > 
> > > > > >     b) Part of the migrated state
> > > > > > 
> > > > > >     c) Something that's allowed to be reloaded after migration
> > > > > > 
> > > > > >     d) Configurable
> > > > > > 
> > > > > > so I'm not sure whether it's a good example in this case or not.
> > > > > 
> > > > > Can you think of an example that has only one option?
> > > > > 
> > > > > I tried but couldn't. For example take a sound card. The guest is 
> > > > > aware
> > > > > the device supports stereo playback (2 output channels), but which 
> > > > > exact
> > > > > stereo host device is used doesn't matter, they are all suitable.
> > > > > 
> > > > > Now imagine migrating to a 7.1 surround-sound device. Similar options
> > > > > come into play:
> > > > > 
> > > > > a) Emulate stereo and mix it to 7.1 surround-sound on the physical
> > > > >    device. The guest still sees the stereo device.
> > > > > 
> > > > > b) Refuse migration.
> > > > > 
> > > > > c) Indicate that the output has switched and let the guest reconfigure
> > > > >    itself (e.g. a sound card with multiple outputs, where one of them 
> > > > > is
> > > > >    stereo and another is 7.1 surround sound).
> > > > > 
> > > > > Which option is desirable depends on the use case.
> > > > 
> > > > Yes, but I think it might be worth calling out these differences;  there
> > > > are explicitly cases where you don't want external changes to be visible
> > > > and other cases where you do; both are valid, but both need thinking
> > > > about. (Another one, GPU whether you have a monitor plugged in!)
> > > 
> > > Okay.
> > > 
> > > > > > Maybe what's needed is a stronger instruction to abstract external
> > > > > > device state so that it's not part of the configuration in most 
> > > > > > cases.
> > > > > 
> > > > > Do you want to propose something?
> > > > 
> > > > I think something like 'Some part of a devices state may be irrelevant
> > > > to a migration; for example on some NICs it might be preferable to hide
> > > > the physical characteristics of the link from the guest.'
> > > 
> > > Got it.
> > > 
> > > > > > > For example, if address filtering support was added to a network 
> > > > > > > card then
> > > > > > > device versions and the corresponding configurations may look 
> > > > > > > like this:
> > > > > > > * ``version=1`` - Behaves as if ``rx-filter-size=0``
> > > > > > > * ``version=2`` - ``rx-filter-size=32``
> > > > > > 
> > > > > > Note configuration parameters might have been added during the life 
> > > > > > of
> > > > > > the device; e.g. if the original card had no support for 
> > > > > > rx-filters, it
> > > > > > might not have a rx-filter-size parameter.
> > > > > 
> > > > > version=1 does not explicitly set rx-filter-size=0. When a new 
> > > > > parameter
> > > > > is introduced it must have a default value that disables its effect on
> > > > > the hardware interface and/or device state representation. This is
> > > > > described in a bit more detail in the next section, maybe it should be
> > > > > reordered.
> > > > 
> > > > We've generally found the definition of devices tends in practice to be
> > > > done newer->older; i.e. you define the current machine, and then define
> > > > the next older machine setting the defaults that used to be true; then
> > > > define the older version behind that....
> > > 
> > > That is not possible here because an older device implementation is
> > > unaware of new configuration parameters.
> > > 
> > > Looking at the example above, imagine a version=1 device is instantiated
> > > on a device implementation that supports both version=1 and version=2.
> > > Should the configuration parameter list for version=1 be empty or
> > > rx-filter-size=0?
> > > 
> > > It must to be empty, otherwise an older device implementation that only
> > > supports version=1 cannot instantiate the device. The older device
> > > implementation does not recognize the rx-filter-size configuration
> > > parameter (it was introduced in version=2) so we cannot set it to 0.
> > 
> > I think this question might come down to who expands the device version
> > definition.
> > If it's the device itself that expands that, then a version 2 device
> > knows about what it needs to do for version 1 compatibility.
> > But if you're saying someone outside the device needs to be able to
> > expand that list then I'm not sure how you'd keep that expansion in line
> > with the implementation of a device.
> 
> The current approach is that the version is expanded into configuration
> parameters when the device is instantiated. Those parameters are then
> used to check migration compatibility of the destination (versions don't
> play a role once the device has been created).
> 
> Michael replied in another sub-thread wondering if versions are really
> necessary since tools do the migration checks. Let's try dropping
> versions to simplify things. We can bring them back if needed later.

What does a user facing tool do?  If I say I want one of these NICs
and I'm on the latest QEMU machine type, who sets all these parameters?

Dave

> > > > > > > Device States
> > > > > > > -------------
> > > > > > > The details of the device state representation are not covered in 
> > > > > > > this document
> > > > > > > but the general requirements are discussed here.
> > > > > > > 
> > > > > > > The device state consists of data accessible through the device's 
> > > > > > > hardware
> > > > > > > interface and internal state that is needed to restore device 
> > > > > > > operation.
> > > > > > > State in the hardware interface includes the values of hardware 
> > > > > > > registers.
> > > > > > > An example of internal state is an index value needed to avoid 
> > > > > > > processing
> > > > > > > queued requests more than once.
> > > > > > 
> > > > > > I try and emphasise that 'internal state' should be represented in 
> > > > > > a way
> > > > > > that reflects the problem rather than the particular implementation;
> > > > > > this gives it a better chance of migrating to future versions.
> > > > > 
> > > > > Sounds like a good idea.
> > > > > 
> > > > > > > Changes can be made to the device state representation as 
> > > > > > > follows. Each change
> > > > > > > to device state must have a corresponding device configuration 
> > > > > > > parameter that
> > > > > > > allows the change to toggled:
> > > > > > > 
> > > > > > > * When the parameter is disabled the hardware interface and 
> > > > > > > device state
> > > > > > >   representation are unchanged. This allows old device states to 
> > > > > > > be loaded.
> > > > > > > 
> > > > > > > * When the parameter is enabled the change comes into effect.
> > > > > > > 
> > > > > > > * The parameter's default value disables the change. Therefore 
> > > > > > > old versions do
> > > > > > >   not have to explicitly specify the parameter.
> > > > > > > 
> > > > > > > The following example illustrates migration from an old device
> > > > > > > implementation to a new one. A version=1 network card is migrated 
> > > > > > > to a
> > > > > > > new device implementation that is also capable of version=2 and 
> > > > > > > adds the
> > > > > > > rx-filter-size=32 parameter. The new device is instantiated with
> > > > > > > version=1, which disables rx-filter-size and is capable of 
> > > > > > > loading the
> > > > > > > version=1 device state. The migration completes successfully but 
> > > > > > > note
> > > > > > > the device is still operating at version=1 level in the new 
> > > > > > > device.
> > > > > > > 
> > > > > > > The following example illustrates migration from a new device
> > > > > > > implementation back to an older one. The new device implementation
> > > > > > > supports version=1 and version=2. The old device implementation 
> > > > > > > supports
> > > > > > > version=1 only. Therefore the device can only be migrated when
> > > > > > > instantiated with version=1 or the equivalent full configuration
> > > > > > > parameters.
> > > > > > 
> > > > > > I'm sometimes asked for 'ways out' of buggy migration cases; e.g. 
> > > > > > what
> > > > > > happens if version=1 forgot to migrate the X register; or what 
> > > > > > happens
> > > > > > if verison=1 forgot to handle the special, rare case when X=5 and we
> > > > > > now need to migrate some extra state.
> > > > > 
> > > > > Can these cases be handled by adding additional configuration 
> > > > > parameters?
> > > > > 
> > > > > If version=1 is lacks essential state then version=2 can add it. The
> > > > > user must configure the device to use version before they can save the
> > > > > full state.
> > > > > 
> > > > > If version=1 didn't handle the X=5 case then the same solution is
> > > > > needed. A new configuration parameter is introduced and the user needs
> > > > > to configure the device to be the new version before migrating.
> > > > > 
> > > > > Unfortunately this requires poweroff or hotplugging a new device
> > > > > instance. But some disruption is probably necessarily anyway so the
> > > > > migration code on the host side can be patched to use the updated 
> > > > > device
> > > > > state representation.
> > > > 
> > > > There are some corner cases that people sometimes prefer; for example
> > > > lets say the X=5 case is actually really rare - but when it happens the
> > > > device is hopelessly broken, some device authors prefer to fix it and
> > > > send the extra data and let the migration fail if the destination
> > > > doesn't understand it (it would break anyway).
> > > 
> > > The device implementation needs to be updated to send the extra data. At
> > > that point a new device configuration parameter should be introduced and
> > > if the user wishes to run the new version of the device then the extra
> > > data will be sent.
> > > 
> > > If the destination doesn't support the new parameter then migration will
> > > be refused. That matches what you've described, so I think the approach
> > > in this document handles this case.
> > 
> > Well that's the ideal; but the case I'm describing is where you're
> > recovering from a screwup in which the migration is going to fail in a
> > rare (runtime defined) corner case, and only sending the extra data
> > in that rare case before you get a chance to define a new version.
> 
> You need to upgrade the migration code in order to produce that extra
> data. Why not define a configuration parameter alongside this code
> change?
> 
> > > > I've also been asked
> > > > by mst for a 'unexpected data' mechanism to send data that the
> > > > destination might not expect if it didn't know about it, for similar
> > > > cases.
> > > 
> > > Do you mean optional data that can be more or less safely dropped? A new
> > > device configuration parameter is not needed because the hardware
> > > interface and device state representation remain compatible. That
> > > feature can be defined in the device state representation spec and is
> > > not visible at the layer discussed in this document. But I think it's
> > > worth adding an explanation into this document explaining what to do.
> > 
> > I mean a way to send optional data that the destination can drop; but
> > that the destination doesn't know what it means and at the time the
> > destination was written, wasn't yet defined. It is part of the device
> > state;  it's similar to the X=5 case above - but in this case it allows
> > the migration not to fail even when you start sending the extra data.
> 
> The device state representation may have a way of sending optional data.
> Since it just gets dropped if the destination doesn't recognize it there
> is no need to introduce a configuration parameter and it doesn't play a
> part in migration compatibility checks.
> 
> Stefan


-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK




reply via email to

[Prev in Thread] Current Thread [Next in Thread]