qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC v3] VFIO Migration


From: Alex Williamson
Subject: Re: [RFC v3] VFIO Migration
Date: Mon, 16 Nov 2020 10:30:15 -0700

On Mon, 16 Nov 2020 14:52:26 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Mon, 16 Nov 2020 11:02:51 +0000
> Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> > On Wed, Nov 11, 2020 at 04:35:43PM +0100, Cornelia Huck wrote:  
> > > On Wed, 11 Nov 2020 15:14:49 +0000
> > > Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > >     
> > > > On Wed, Nov 11, 2020 at 12:48:53PM +0100, Cornelia Huck wrote:    
> > > > > On Tue, 10 Nov 2020 13:14:04 -0700
> > > > > Alex Williamson <alex.williamson@redhat.com> wrote:      
> > > > > > On Tue, 10 Nov 2020 09:53:49 +0000
> > > > > > Stefan Hajnoczi <stefanha@redhat.com> wrote:      
> > > > >       
> > > > > > > Device models supported by an mdev driver and their details can 
> > > > > > > be read from
> > > > > > > the migration_info.json attr. Each mdev type supports one device 
> > > > > > > model. If a
> > > > > > > parent device supports multiple device models then each device 
> > > > > > > model has an
> > > > > > > mdev type. There may be multiple mdev types for a single device 
> > > > > > > model when they
> > > > > > > offer different migration parameters such as resource capacity or 
> > > > > > > feature
> > > > > > > availability.
> > > > > > > 
> > > > > > > For example, a graphics card that supports 4 GB and 8 GB device 
> > > > > > > instances would
> > > > > > > provide gfx-4GB and gfx-8GB mdev types with memory=4096 and 
> > > > > > > memory=8192
> > > > > > > migration parameters, respectively.        
> > > > > > 
> > > > > > 
> > > > > > I think this example could be expanded for clarity.  I think this is
> > > > > > suggesting we have mdev_types of gfx-4GB and gfx-8GB, which each
> > > > > > implement some common device model, ie. com.gfx/GPU, where the
> > > > > > migration parameter 'memory' for each defaults to a value matching 
> > > > > > the
> > > > > > type name.  But it seems like this can also lead to some 
> > > > > > combinatorial
> > > > > > challenges for management tools if these parameters are writable.  
> > > > > > For
> > > > > > example, should a management tool create a gfx-4GB device and 
> > > > > > change to
> > > > > > memory parameter to 8192 or a gfx-8GB device with the default 
> > > > > > parameter?      
> > > > > 
> > > > > I would expect that the mdev types need to match in the first place.
> > > > > What role would the memory= parameter play, then? Allowing gfx-4GB to
> > > > > have memory=8192 feels wrong to me.      
> > > > 
> > > > Yes, I expected these mdev types to only accept a fixed "memory" value,
> > > > but there's nothing stopping a driver author from making "memory" accept
> > > > any value.    
> > > 
> > > I'm wondering how useful the memory parameter is, then. The layer
> > > checking for compatibility can filter out inconsistent settings, but
> > > why would we need to express something that is already implied in the
> > > mdev type separately?    
> > 
> > To avoid tying device instances to specific mdev types. An mdev type is
> > a device implementation, but the goal is to enable migration between
> > device implementations (new/old or completely different
> > implementations).
> > 
> > Imagine a new physical device that now offers variable memory because
> > users found the static mdev types too constraining.  How do you migrate
> > back and forth between new and old physical devices if the migration
> > parameters don't describe the memory size? Migration parameters make it
> > possible. Without them the management tool needs to hard-code knowledge
> > of specific mdev types that support migration.  
> 
> But doesn't the management tool *still* need to keep hardcoded
> information about what the value of that memory parameter was for an
> existing mdev type? If we have gfx-variable with a memory parameter,
> fine; but if the target is supposed to accept a gfx-4GB device, it
> should simply instantiate a gfx-4GB device.
> 
> I'm getting a bit worried about the complexity of the checking that
> management software is supposed to perform. Is it really that bad to
> restrict the models to a few, well-defined ones? Especially in the mdev
> case, where we have control about what is getting instantiated?

This is exactly what I was noting with the combinatorial challenges of
the management tool.  If a vendor chooses to use a generic base device
model which they modify with parameters to match an assortment of mdev
types, then management tools will need to match every mdev type
implementing that device model to determine if compatible parameters
exist.  OTOH, the vendor could choose to create a device model that
specifically describes a single configuration of known parameters.

For example, mdev type gfx-4GB might be a device model com.gfx/GPU with
a fixed memory parameter of 4GB or it could be a device model
com.gfx/GPU-4G with no additional parameter.  The hard part is when the
vendor offers an mdev type gfx-varGB with device model com.gfx/GPU and
available memory options of 1GB, 2GB, 4GB, 8GB.  At that point a
management tool might decide to create a gfx-varGB device instance and
tune the memory parameter or create a gfx-4GB instance, either would be
correct and we've expressed no preference for one or the other.  Thanks,

Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]