qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration


From: Alex Williamson
Subject: Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration
Date: Wed, 28 Sep 2016 16:49:52 -0600

On Wed, 28 Sep 2016 12:59:59 -0700
Neo Jia <address@hidden> wrote:

> On Wed, Sep 28, 2016 at 07:45:38PM +0000, Tian, Kevin wrote:
> > > From: Neo Jia [mailto:address@hidden
> > > Sent: Thursday, September 29, 2016 3:23 AM
> > > 
> > > On Thu, Sep 22, 2016 at 03:26:38PM +0100, Daniel P. Berrange wrote:  
> > > > On Thu, Sep 22, 2016 at 08:19:21AM -0600, Alex Williamson wrote:  
> > > > > On Thu, 22 Sep 2016 09:41:20 +0530
> > > > > Kirti Wankhede <address@hidden> wrote:
> > > > >  
> > > > > > >>>>> My concern is that a type id seems arbitrary but we're 
> > > > > > >>>>> specifying that
> > > > > > >>>>> it be unique.  We already have something unique, the name.  
> > > > > > >>>>> So why try
> > > > > > >>>>> to make the type id unique as well?  A vendor can 
> > > > > > >>>>> accidentally create
> > > > > > >>>>> their vendor driver so that a given name means something very
> > > > > > >>>>> specific.  On the other hand they need to be extremely 
> > > > > > >>>>> deliberate to
> > > > > > >>>>> coordinate that a type id means a unique thing across all 
> > > > > > >>>>> their product
> > > > > > >>>>> lines.
> > > > > > >>>>>  
> > > > > > >>>>
> > > > > > >>>> Let me clarify, type id should be unique in the list of
> > > > > > >>>> mdev_supported_types. You can't have 2 directories in with 
> > > > > > >>>> same name.  
> > > > > > >>>
> > > > > > >>> Of course, but does that mean it's only unique to the machine 
> > > > > > >>> I'm
> > > > > > >>> currently running on?  Let's say I have a Tesla P100 on my 
> > > > > > >>> system and
> > > > > > >>> type-id 11 is named "GRID-M60-0B".  At some point in the future 
> > > > > > >>> I
> > > > > > >>> replace the Tesla P100 with a Q1000 (made up).  Is type-id 11 
> > > > > > >>> on that
> > > > > > >>> new card still going to be a "GRID-M60-0B"?  If not then we've 
> > > > > > >>> based
> > > > > > >>> our XML on the wrong attribute.  If the new device does not 
> > > > > > >>> support
> > > > > > >>> "GRID-M60-0B" then we should generate an error, not simply 
> > > > > > >>> initialize
> > > > > > >>> whatever type-id 11 happens to be on this new card.
> > > > > > >>>  
> > > > > > >>
> > > > > > >> If there are 2 M60 in the system then you would find '11' type 
> > > > > > >> directory
> > > > > > >> in mdev_supported_types of both M60. If you have P100, '11' type 
> > > > > > >> would
> > > > > > >> not be there in its mdev_supported_types, it will have different 
> > > > > > >> types.
> > > > > > >>
> > > > > > >> For example, if you replace M60 with P100, but XML is not 
> > > > > > >> updated. XML
> > > > > > >> have type '11'. When libvirt would try to create mdev device, 
> > > > > > >> libvirt
> > > > > > >> would have to find 'create' file in sysfs in following directory 
> > > > > > >> format:
> > > > > > >>
> > > > > > >>  --- mdev_supported_types
> > > > > > >>      |-- 11
> > > > > > >>      |   |-- create
> > > > > > >>
> > > > > > >> but now for P100, '11' directory is not there, so libvirt should 
> > > > > > >> throw
> > > > > > >> error on not able to find '11' directory.  
> > > > > > >
> > > > > > > This really seems like an accident waiting to happen.  What 
> > > > > > > happens
> > > > > > > when the user replaces their M60 with an Intel XYZ device that 
> > > > > > > happens
> > > > > > > to expose a type 11 mdev class gpu device?  How is libvirt 
> > > > > > > supposed to
> > > > > > > know that the XML used to refer to a GRID-M60-0B and now it's an
> > > > > > > INTEL-IGD-XYZ?  Doesn't basing the XML entry on the name and 
> > > > > > > removing
> > > > > > > yet another arbitrary requirement that we have some sort of 
> > > > > > > globally
> > > > > > > unique type-id database make a lot of sense?  The same issue 
> > > > > > > applies
> > > > > > > for simple debug-ability, if I'm reviewing the XML for a domain 
> > > > > > > and the
> > > > > > > name is the primary index for the mdev device, I know what it is.
> > > > > > > Seeing type-id='11' is meaningless.
> > > > > > >  
> > > > > >
> > > > > > Let me clarify again, type '11' is a string that vendor driver would
> > > > > > define (see my previous reply below) it could be "11" or 
> > > > > > "GRID-M60-0B".
> > > > > > If 2 vendors used same string we can't control that. right?
> > > > > >
> > > > > >  
> > > > > > >>>> Lets remove 'id' from type id in XML if that is the concern. 
> > > > > > >>>> Supported
> > > > > > >>>> types is going to be defined by vendor driver, so let vendor 
> > > > > > >>>> driver
> > > > > > >>>> decide what to use for directory name and same should be used 
> > > > > > >>>> in device
> > > > > > >>>> xml file, it could be '11' or "GRID M60-0B":
> > > > > > >>>>
> > > > > > >>>>     <device>
> > > > > > >>>>       <name>my-vgpu</name>
> > > > > > >>>>       <parent>pci_0000_86_00_0</parent>
> > > > > > >>>>       <capability type='mdev'>
> > > > > > >>>>         <type='11'/>
> > > > > > >>>>         ...
> > > > > > >>>>       </capability>
> > > > > > >>>>     </device>  
> > > > >
> > > > > Then let's get rid of the 'name' attribute and let the sysfs directory
> > > > > simply be the name.  Then we can get rid of 'type' altogether so we
> > > > > don't have this '11' vs 'GRID-M60-0B' issue.  Thanks,  
> > > >
> > > > That sounds nice to me - we don't need two unique identifiers if
> > > > one will do.  
> > > 
> > > Hi Alex and Daniel,
> > > 
> > > I just had some internal discussions here within NVIDIA and found out that
> > > actually the name/label potentially might not be unique and the "id" will 
> > > be.
> > > So I think we still would like to keep both so the id is the programmatic 
> > > id
> > > and the name/label is a human readable string for it, which might get 
> > > changed to
> > > be non-unique by outside of engineering.
> > > 
> > > Sorry for the change.
> > > 
> > > Thanks,
> > > Neo
> > >   
> > 
> > A curious question. How do we expect such a descriptive name/label used
> > by upper-level stack (e.g. openstack)? Should openstack define a vGPU
> > flavor just using ID (GRID-type11) or using both ID/name (GRID-type11-
> > M60-0B) for end customer to choose? If it's only for human information,
> > does it make sense e.g. providing only unique ID in sysfs while relying on 
> > vendor specific documentation to describe what the ID actually means?  
> 
> Hi Kevin,
> 
> The id is not visible to the upper-level stack, only the name / label will be
> shown to the end customer to choose, such as "GRID-M60-0B", as we might expose
> the same virtual device (name/label) with some internal difference which will 
> be tracked by the different unique id.
> 
> I think having the ability to allow libvirt or upper-level stack to display a
> human readable string for a given type of vgpu will make the user life easier.

Again, we need a stable and unique string to go into the XML.  That
should represent a consistent device regardless of host driver versions
or specific hardware.  I would consider that string to be user visible.

Also, can you define exactly what you mean by "unique"?  What's the
purpose of the label "GRID-M60-0B" if it's not unique?  Does type 11
mean "GRID-M60-0B" as implemented on a specific card and type 12 might
mean "GRID-M60-0B" as implemented on a different card?  Do you want
your users to be able to instantiate their VM on any "GRID-M60-0B"
mdev, or does it need to be a type 11 GRID-M60-0B mdev?  Thanks,

Alex



reply via email to

[Prev in Thread] Current Thread [Next in Thread]