qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release


From: Tian, Kevin
Subject: Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
Date: Tue, 26 Jan 2016 21:38:01 +0000

> From: Alex Williamson [mailto:address@hidden
> Sent: Wednesday, January 27, 2016 4:06 AM
> 
> On Tue, 2016-01-26 at 02:20 -0800, Neo Jia wrote:
> > On Mon, Jan 25, 2016 at 09:45:14PM +0000, Tian, Kevin wrote:
> > > > From: Alex Williamson [mailto:address@hidden
> >
> > Hi Alex, Kevin and Jike,
> >
> > (Seems I shouldn't use attachment, resend it again to the list, patches are
> > inline at the end)
> >
> > Thanks for adding me to this technical discussion, a great opportunity
> > for us to design together which can bring both Intel and NVIDIA vGPU 
> > solution to
> > KVM platform.
> >
> > Instead of directly jumping to the proposal that we have been working on
> > recently for NVIDIA vGPU on KVM, I think it is better for me to put out 
> > couple
> > quick comments / thoughts regarding the existing discussions on this thread 
> > as
> > fundamentally I think we are solving the same problem, DMA, interrupt and 
> > MMIO.
> >
> > Then we can look at what we have, hopefully we can reach some consensus 
> > soon.
> >
> > > Yes, and since you're creating and destroying the vgpu here, this is
> > > where I'd expect a struct device to be created and added to an IOMMU
> > > group.  The lifecycle management should really include links between
> > > the vGPU and physical GPU, which would be much, much easier to do with
> > > struct devices create here rather than at the point where we start
> > > doing vfio "stuff".
> >
> > Infact to keep vfio-vgpu to be more generic, vgpu device creation and 
> > management
> > can be centralized and done in vfio-vgpu. That also include adding to IOMMU
> > group and VFIO group.
> 
> Is this really a good idea?  The concept of a vgpu is not unique to
> vfio, we want vfio to be a driver for a vgpu, not an integral part of
> the lifecycle of a vgpu.  That certainly doesn't exclude adding
> infrastructure to make lifecycle management of a vgpu more consistent
> between drivers, but it should be done independently of vfio.  I'll go
> back to the SR-IOV model, vfio is often used with SR-IOV VFs, but vfio
> does not create the VF, that's done in coordination with the PF making
> use of some PCI infrastructure for consistency between drivers.
> 
> It seems like we need to take more advantage of the class and driver
> core support to perhaps setup a vgpu bus and class with vfio-vgpu just
> being a driver for those devices.

Agree with Alex here. Even if we want to do more abstraction of overall
vgpu management, here let's stick to necessary changes within VFIO 
scope.


> >
> > 6. Examples
> >
> =====================================================
> =============================
> >
> > On this server, we have two NVIDIA M60 GPUs.
> >
> > address@hidden ~]# lspci -d 10de:13f2
> > 86:00.0 VGA compatible controller: NVIDIA Corporation Device 13f2 (rev a1)
> > 87:00.0 VGA compatible controller: NVIDIA Corporation Device 13f2 (rev a1)
> >
> > After nvidia.ko gets initialized, we can query the supported vGPU type by
> > accessing the "vgpu_supported_types" like following:
> >
> > address@hidden ~]# cat
> /sys/bus/pci/devices/0000\:86\:00.0/vgpu_supported_types
> > 11:GRID M60-0B
> > 12:GRID M60-0Q
> > 13:GRID M60-1B
> > 14:GRID M60-1Q
> > 15:GRID M60-2B
> > 16:GRID M60-2Q
> > 17:GRID M60-4Q
> > 18:GRID M60-8Q
> >
> > For example the VM_UUID is c0b26072-dd1b-4340-84fe-bf338c510818, and we 
> > would
> > like to create "GRID M60-4Q" VM on it.
> >
> > echo "c0b26072-dd1b-4340-84fe-bf338c510818:0:17" >
> /sys/bus/pci/devices/0000\:86\:00.0/vgpu_create
> >
> > Note: the number 0 here is for vGPU device index. So far the change is not 
> > tested
> > for multiple vgpu devices yet, but we will support it.
> >
> > At this moment, if you query the "vgpu_supported_types" it will still show 
> > all
> > supported virtual GPU types as no virtual GPU resource is committed yet.
> >
> > Starting VM:
> >
> > echo "c0b26072-dd1b-4340-84fe-bf338c510818" > /sys/class/vgpu/vgpu_start
> >
> > then, the supported vGPU type query will return:
> >
> > address@hidden /home/cjia]$
> > > cat /sys/bus/pci/devices/0000\:86\:00.0/vgpu_supported_types
> > 17:GRID M60-4Q
> >
> > So vgpu_supported_config needs to be called whenever a new virtual device 
> > gets
> > created as the underlying HW might limit the supported types if there are
> > any existing VM runnings.
> >
> > Then, VM gets shutdown, writes to /sys/class/vgpu/vgpu_shutdown will info 
> > the
> > GPU driver vendor to clean up resource.
> >
> > Eventually, those virtual GPUs can be removed by writing to vgpu_destroy 
> > under
> > device sysfs.
> 
> 
> I'd like to hear Intel's thoughts on this interface.  Are there
> different vgpu capacities or priority classes that would necessitate
> different types of vcpus on Intel?

We'll evaluate this proposal with our requirement. A quick comment is
that we don't have such type thing required. We just expose the same
type of vgpu as the underlying platform. On the other hand, our
implementation gives flexibility to user to control resource allocation 
(e.g. video memory) to different VMs, instead of a fixed partition
scheme, so we have an interface to query remaining free resources.

> 
> Does Intel have a need for start and shutdown interfaces?

No for now. But we can extend to support such interface which provides
more flexibility to separate resource allocation from run-time control.

Given that nvidia/intel do have specific requirement on vgpu management,
I'd suggest that we focus on VFIO change first. After that we can evaluate
how much commonality of vgpu management upon which to evaluate 
whether to have a common vgpu framework or just stay with vendor
specific implementation for that part.

Thanks,
Kevin

reply via email to

[Prev in Thread] Current Thread [Next in Thread]