[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration
From: |
Neo Jia |
Subject: |
Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration |
Date: |
Wed, 28 Sep 2016 13:06:31 -0700 |
User-agent: |
Mutt/1.6.2 (2016-07-01) |
On Wed, Sep 28, 2016 at 01:55:47PM -0600, Alex Williamson wrote:
> On Wed, 28 Sep 2016 12:22:35 -0700
> Neo Jia <address@hidden> wrote:
>
> > On Thu, Sep 22, 2016 at 03:26:38PM +0100, Daniel P. Berrange wrote:
> > > On Thu, Sep 22, 2016 at 08:19:21AM -0600, Alex Williamson wrote:
> > > > On Thu, 22 Sep 2016 09:41:20 +0530
> > > > Kirti Wankhede <address@hidden> wrote:
> > > >
> > > > > >>>>> My concern is that a type id seems arbitrary but we're
> > > > > >>>>> specifying that
> > > > > >>>>> it be unique. We already have something unique, the name. So
> > > > > >>>>> why try
> > > > > >>>>> to make the type id unique as well? A vendor can accidentally
> > > > > >>>>> create
> > > > > >>>>> their vendor driver so that a given name means something very
> > > > > >>>>> specific. On the other hand they need to be extremely
> > > > > >>>>> deliberate to
> > > > > >>>>> coordinate that a type id means a unique thing across all their
> > > > > >>>>> product
> > > > > >>>>> lines.
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>> Let me clarify, type id should be unique in the list of
> > > > > >>>> mdev_supported_types. You can't have 2 directories in with same
> > > > > >>>> name.
> > > > > >>>
> > > > > >>> Of course, but does that mean it's only unique to the machine I'm
> > > > > >>> currently running on? Let's say I have a Tesla P100 on my system
> > > > > >>> and
> > > > > >>> type-id 11 is named "GRID-M60-0B". At some point in the future I
> > > > > >>> replace the Tesla P100 with a Q1000 (made up). Is type-id 11 on
> > > > > >>> that
> > > > > >>> new card still going to be a "GRID-M60-0B"? If not then we've
> > > > > >>> based
> > > > > >>> our XML on the wrong attribute. If the new device does not
> > > > > >>> support
> > > > > >>> "GRID-M60-0B" then we should generate an error, not simply
> > > > > >>> initialize
> > > > > >>> whatever type-id 11 happens to be on this new card.
> > > > > >>>
> > > > > >>
> > > > > >> If there are 2 M60 in the system then you would find '11' type
> > > > > >> directory
> > > > > >> in mdev_supported_types of both M60. If you have P100, '11' type
> > > > > >> would
> > > > > >> not be there in its mdev_supported_types, it will have different
> > > > > >> types.
> > > > > >>
> > > > > >> For example, if you replace M60 with P100, but XML is not updated.
> > > > > >> XML
> > > > > >> have type '11'. When libvirt would try to create mdev device,
> > > > > >> libvirt
> > > > > >> would have to find 'create' file in sysfs in following directory
> > > > > >> format:
> > > > > >>
> > > > > >> --- mdev_supported_types
> > > > > >> |-- 11
> > > > > >> | |-- create
> > > > > >>
> > > > > >> but now for P100, '11' directory is not there, so libvirt should
> > > > > >> throw
> > > > > >> error on not able to find '11' directory.
> > > > > >
> > > > > > This really seems like an accident waiting to happen. What happens
> > > > > > when the user replaces their M60 with an Intel XYZ device that
> > > > > > happens
> > > > > > to expose a type 11 mdev class gpu device? How is libvirt supposed
> > > > > > to
> > > > > > know that the XML used to refer to a GRID-M60-0B and now it's an
> > > > > > INTEL-IGD-XYZ? Doesn't basing the XML entry on the name and
> > > > > > removing
> > > > > > yet another arbitrary requirement that we have some sort of globally
> > > > > > unique type-id database make a lot of sense? The same issue applies
> > > > > > for simple debug-ability, if I'm reviewing the XML for a domain and
> > > > > > the
> > > > > > name is the primary index for the mdev device, I know what it is.
> > > > > > Seeing type-id='11' is meaningless.
> > > > > >
> > > > >
> > > > > Let me clarify again, type '11' is a string that vendor driver would
> > > > > define (see my previous reply below) it could be "11" or
> > > > > "GRID-M60-0B".
> > > > > If 2 vendors used same string we can't control that. right?
> > > > >
> > > > >
> > > > > >>>> Lets remove 'id' from type id in XML if that is the concern.
> > > > > >>>> Supported
> > > > > >>>> types is going to be defined by vendor driver, so let vendor
> > > > > >>>> driver
> > > > > >>>> decide what to use for directory name and same should be used in
> > > > > >>>> device
> > > > > >>>> xml file, it could be '11' or "GRID M60-0B":
> > > > > >>>>
> > > > > >>>> <device>
> > > > > >>>> <name>my-vgpu</name>
> > > > > >>>> <parent>pci_0000_86_00_0</parent>
> > > > > >>>> <capability type='mdev'>
> > > > > >>>> <type='11'/>
> > > > > >>>> ...
> > > > > >>>> </capability>
> > > > > >>>> </device>
> > > >
> > > > Then let's get rid of the 'name' attribute and let the sysfs directory
> > > > simply be the name. Then we can get rid of 'type' altogether so we
> > > > don't have this '11' vs 'GRID-M60-0B' issue. Thanks,
> > >
> > > That sounds nice to me - we don't need two unique identifiers if
> > > one will do.
> >
> > Hi Alex and Daniel,
> >
> > I just had some internal discussions here within NVIDIA and found out that
> > actually the name/label potentially might not be unique and the "id" will
> > be.
> > So I think we still would like to keep both so the id is the programmatic id
> > and the name/label is a human readable string for it, which might get
> > changed to
> > be non-unique by outside of engineering.
>
> I think your discovery only means that for your vendor driver, the name
> will be "11" (as a string). Perhaps you'd like some sort of vendor
> provided description within each type, but I am not in favor of having
> an arbitrary integer value imply something specific within the sysfs
> interface. IOW, the NVIDIA vendor driver should be able to create:
>
> 11
> ├── create
> ├── description
> ├── etc
> └── resolution
>
> While Intel might create:
>
> Skylake-vGPU
> ├── create
> ├── description
> ├── etc
> └── resolution
>
> Maybe "description" is optional for vendors that use useful names?
> Thanks,
I think we should be able to have a unique vendor type string instead of an
arbitrary integer value there as long as we are allowed to have a description
field that can be used to show to the end user as "name / label".
Thanks,
Neo
>
> Alex
- Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration, (continued)
- Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration, Kirti Wankhede, 2016/09/22
- Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration, Alex Williamson, 2016/09/22
- Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration, Daniel P. Berrange, 2016/09/22
- Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration, Neo Jia, 2016/09/28
- Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration, Tian, Kevin, 2016/09/28
- Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration, Neo Jia, 2016/09/28
- Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration, Laine Stump, 2016/09/28
- Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration, Neo Jia, 2016/09/28
- Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration, Alex Williamson, 2016/09/28
- Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration, Alex Williamson, 2016/09/28
- Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration,
Neo Jia <=
- Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration, Alex Williamson, 2016/09/28
- Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration, Daniel P. Berrange, 2016/09/29
- Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration, Neo Jia, 2016/09/29
- Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration, Kirti Wankhede, 2016/09/29
- Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration, Tian, Kevin, 2016/09/21
- Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration, Alex Williamson, 2016/09/21
- Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration, Tian, Kevin, 2016/09/21
- Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration, Tian, Kevin, 2016/09/22
- Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration, Kirti Wankhede, 2016/09/23
- Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration, Tian, Kevin, 2016/09/20