[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC v3] VFIO Migration
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [RFC v3] VFIO Migration |
Date: |
Tue, 24 Nov 2020 17:24:58 +0000 |
User-agent: |
Mutt/1.14.6 (2020-07-11) |
* Alex Williamson (alex.williamson@redhat.com) wrote:
> On Mon, 16 Nov 2020 14:52:26 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
>
> > On Mon, 16 Nov 2020 11:02:51 +0000
> > Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >
> > > On Wed, Nov 11, 2020 at 04:35:43PM +0100, Cornelia Huck wrote:
> > > > On Wed, 11 Nov 2020 15:14:49 +0000
> > > > Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > > >
> > > > > On Wed, Nov 11, 2020 at 12:48:53PM +0100, Cornelia Huck wrote:
> > > > > > On Tue, 10 Nov 2020 13:14:04 -0700
> > > > > > Alex Williamson <alex.williamson@redhat.com> wrote:
> > > > > > > On Tue, 10 Nov 2020 09:53:49 +0000
> > > > > > > Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > > > > >
> > > > > > > > Device models supported by an mdev driver and their details can
> > > > > > > > be read from
> > > > > > > > the migration_info.json attr. Each mdev type supports one
> > > > > > > > device model. If a
> > > > > > > > parent device supports multiple device models then each device
> > > > > > > > model has an
> > > > > > > > mdev type. There may be multiple mdev types for a single device
> > > > > > > > model when they
> > > > > > > > offer different migration parameters such as resource capacity
> > > > > > > > or feature
> > > > > > > > availability.
> > > > > > > >
> > > > > > > > For example, a graphics card that supports 4 GB and 8 GB device
> > > > > > > > instances would
> > > > > > > > provide gfx-4GB and gfx-8GB mdev types with memory=4096 and
> > > > > > > > memory=8192
> > > > > > > > migration parameters, respectively.
> > > > > > >
> > > > > > >
> > > > > > > I think this example could be expanded for clarity. I think this
> > > > > > > is
> > > > > > > suggesting we have mdev_types of gfx-4GB and gfx-8GB, which each
> > > > > > > implement some common device model, ie. com.gfx/GPU, where the
> > > > > > > migration parameter 'memory' for each defaults to a value
> > > > > > > matching the
> > > > > > > type name. But it seems like this can also lead to some
> > > > > > > combinatorial
> > > > > > > challenges for management tools if these parameters are writable.
> > > > > > > For
> > > > > > > example, should a management tool create a gfx-4GB device and
> > > > > > > change to
> > > > > > > memory parameter to 8192 or a gfx-8GB device with the default
> > > > > > > parameter?
> > > > > >
> > > > > > I would expect that the mdev types need to match in the first place.
> > > > > > What role would the memory= parameter play, then? Allowing gfx-4GB
> > > > > > to
> > > > > > have memory=8192 feels wrong to me.
> > > > >
> > > > > Yes, I expected these mdev types to only accept a fixed "memory"
> > > > > value,
> > > > > but there's nothing stopping a driver author from making "memory"
> > > > > accept
> > > > > any value.
> > > >
> > > > I'm wondering how useful the memory parameter is, then. The layer
> > > > checking for compatibility can filter out inconsistent settings, but
> > > > why would we need to express something that is already implied in the
> > > > mdev type separately?
> > >
> > > To avoid tying device instances to specific mdev types. An mdev type is
> > > a device implementation, but the goal is to enable migration between
> > > device implementations (new/old or completely different
> > > implementations).
> > >
> > > Imagine a new physical device that now offers variable memory because
> > > users found the static mdev types too constraining. How do you migrate
> > > back and forth between new and old physical devices if the migration
> > > parameters don't describe the memory size? Migration parameters make it
> > > possible. Without them the management tool needs to hard-code knowledge
> > > of specific mdev types that support migration.
> >
> > But doesn't the management tool *still* need to keep hardcoded
> > information about what the value of that memory parameter was for an
> > existing mdev type? If we have gfx-variable with a memory parameter,
> > fine; but if the target is supposed to accept a gfx-4GB device, it
> > should simply instantiate a gfx-4GB device.
> >
> > I'm getting a bit worried about the complexity of the checking that
> > management software is supposed to perform. Is it really that bad to
> > restrict the models to a few, well-defined ones? Especially in the mdev
> > case, where we have control about what is getting instantiated?
>
> This is exactly what I was noting with the combinatorial challenges of
> the management tool. If a vendor chooses to use a generic base device
> model which they modify with parameters to match an assortment of mdev
> types, then management tools will need to match every mdev type
> implementing that device model to determine if compatible parameters
> exist. OTOH, the vendor could choose to create a device model that
> specifically describes a single configuration of known parameters.
>
> For example, mdev type gfx-4GB might be a device model com.gfx/GPU with
> a fixed memory parameter of 4GB or it could be a device model
> com.gfx/GPU-4G with no additional parameter. The hard part is when the
> vendor offers an mdev type gfx-varGB with device model com.gfx/GPU and
> available memory options of 1GB, 2GB, 4GB, 8GB. At that point a
> management tool might decide to create a gfx-varGB device instance and
> tune the memory parameter or create a gfx-4GB instance, either would be
> correct and we've expressed no preference for one or the other. Thanks,
What you've described here is exactly what happens with QEMU/libvirts
confusion of CPU models. Both QEMU and Libvirt have their idea of what
a named CPU model means and then add/subtract flags to get what they
want.
When libvirt wants a CPU model that doesn't quite match what it has
(e.g. a host-compatibility thing where the host is a CPU it didn't know)
it's heuristics to either start from above and remove things or start
from below and add them.
Dave
> Alex
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Re: [RFC v3] VFIO Migration, Alex Williamson, 2020/11/10
- Re: [RFC v3] VFIO Migration, Cornelia Huck, 2020/11/11
- Re: [RFC v3] VFIO Migration, Stefan Hajnoczi, 2020/11/11
- Re: [RFC v3] VFIO Migration, Cornelia Huck, 2020/11/11
- Re: [RFC v3] VFIO Migration, Stefan Hajnoczi, 2020/11/16
- Re: [RFC v3] VFIO Migration, Cornelia Huck, 2020/11/16
- Re: [RFC v3] VFIO Migration, Alex Williamson, 2020/11/16
- Re: [RFC v3] VFIO Migration,
Dr. David Alan Gilbert <=
Re: [RFC v3] VFIO Migration, Stefan Hajnoczi, 2020/11/11
Re: [RFC v3] VFIO Migration, Cornelia Huck, 2020/11/11
Re: [RFC v3] VFIO Migration, Dr. David Alan Gilbert, 2020/11/11