qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 13/13] qdev-properties: Add pci-devaddr property


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] [PATCH 13/13] qdev-properties: Add pci-devaddr property
Date: Sun, 10 Jun 2012 19:22:55 +0300

On Sun, Jun 10, 2012 at 09:58:17AM -0600, Alex Williamson wrote:
> On Sun, 2012-06-10 at 18:37 +0300, Michael S. Tsirkin wrote:
> > On Sun, Jun 10, 2012 at 09:15:10AM -0600, Alex Williamson wrote:
> > > On Sun, 2012-06-10 at 17:54 +0300, Michael S. Tsirkin wrote:
> > > > On Sun, Jun 10, 2012 at 08:41:03AM -0600, Alex Williamson wrote:
> > > > > On Sun, 2012-06-10 at 17:03 +0300, Michael S. Tsirkin wrote:
> > > > > > On Sun, Jun 10, 2012 at 07:41:51AM -0600, Alex Williamson wrote:
> > > > > > > > > >>>> vfio_pci.c contains a nice function called 
> > > > > > > > > >>>> "parse_hostaddr". You may
> > > > > > > > > >>>> guess what it does. ;)
> > > > > > > > > >>>
> > > > > > > > > >>> Interesting. Why? This looks strange to me:
> > > > > > > > > >>> I would expect the admin to bind a device to vfio
> > > > > > > > > >>> the way it's now bound to a stub.
> > > > > > > > > >>> The pass /dev/vfioXXX to qemu.
> > > > > > > > > >>
> > > > > > > > > >> That's the "libvirt way". We surely also want the "qemu 
> > > > > > > > > >> command line
> > > > > > > > > >> way" for which this kind of service is needed.
> > > > > > > > > >>
> > > > > > > > > >> Jan
> > > > > > > > > >>
> > > > > > > > > > 
> > > > > > > > > > Yes, I imagine the qemu command line passing in 
> > > > > > > > > > /dev/vfioXXX,
> > > > > > > > > > the libvirt way will pass in an fd for above. No?
> > > > > > > > > 
> > > > > > > > > As far as I understand the API, there is no device file per 
> > > > > > > > > assigned
> > > > > > > > > device.
> > > > > > > > 
> > > > > > > > Does it do pci_get_domain_bus_and_slot like kvm then?
> > > > > > > > With all the warts like you have to remember to bind pci stub
> > > > > > > > or you get two drivers for one device?
> > > > > > > > If true that's unfortunate IMHO.
> > > > > > 
> > > > > > I hope the answer to the above is no?
> > > > > 
> > > > > No, it does a probe for devices.  We need the devaddr to compare 
> > > > > against
> > > > > dev_name of the device to figure out which device the user is 
> > > > > attempting
> > > > > to identify.
> > > > > 
> > > > > > > > > Also, this [domain:]bus:dev.fn format is more handy for the
> > > > > > > > > command line.
> > > > > > > > > 
> > > > > > > > > Jan
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > Then users could add udev rules that will name vfio devices
> > > > > > > > like this.  Another interesting option: /dev/vfio/eth0/vf1.
> > > > > > > > That's better I think: no one really likes running lspci
> > > > > > > > and guessing the address from there.
> > > > > > > 
> > > > > > > That's not at all how VFIO works.  /dev/vfio/# represents a 
> > > > > > > group, which
> > > > > > > may contain one or more devices.  Even if libvirt passes a file
> > > > > > > descriptor for the group, qemu needs to know which device in the 
> > > > > > > group
> > > > > > > to add to the guest, so parsing a device address is still 
> > > > > > > necessary.
> > > > > > > Thanks,
> > > > > > > 
> > > > > > > Alex
> > > > > > 
> > > > > > That's very unusual, and unfortunate.  For example this means that I
> > > > > > must update applications just because I move a card to another slot.
> > > > > > UIO does not have this problem.
> > > > > > The fact that it's broken in kvm ATM seems to have made people
> > > > > > think it's okay, but it really is a bug. We didn't fix it
> > > > > > because vfio was supposed to be the solution.
> > > > > 
> > > > > I don't know what you're talking about here.  Are you suggesting that
> > > > > needing to specify -device pci-assign,host=3.0 changing to host=4.0 
> > > > > when
> > > > > you move a card is broken?
> > > > 
> > > > Yes. Absolutely. Admin should be able to abstract it away without users
> > > > knowing anything about it.
> > > 
> > > We don't have UUIDs on PCI devices, so who's to say that the device that
> > > was in slot 3 is the same device that's now in slot 4 and the user
> > > should still have access to it?  That sounds even more problematic.
> > 
> > PF has a driver loaded so you can identify that, and
> > identify the VF through it. Again this is really policy,
> > it should be up to the admin how to name the device.
> 
> Do PFs have a UUID?  Some devices support a serial number, but that's
> not related to being a PF vs VF.  We need to support both PFs and VFs
> regardless of whether they have any kind of UUID.

This is a solved problem. udev has class-specific ways to
get the device id and use it to find the name.

> I think we're inventing a problem though.

You think persistent names in udev were a solution in search of a
problem?

> > > > >  How does UIO avoid such a problem.
> > > > 
> > > > Normally you use a misc device that you can name with udev.
> > > > 
> > > > >  UIO-pci
> > > > > requires the user to use pci-sysfs for resource access, so it surely
> > > > > cares about the device address.
> > > > 
> > > > Only uio_pci_generic. Other uio devices let you drive the
> > > > device.
> > > 
> > > If this is actually a problem, this is the first ever complaint I've
> > > heard about it.  As above, I don't think we can assume the same access
> > > when a device is moved.
> > 
> > I thought need for sane naming and for sysfs interface was discussed
> > multiple times. But maybe I'm misremembering.
> 
> There is sane naming and a sysfs interface...

It's not sane if the admin can't rename the device without breaking
applications.

> > > > > > I do realize you want to represent a group of devices somehow but 
> > > > > > can't
> > > > > > this be solved without breaking naming devices with udev? For 
> > > > > > example, the
> > > > > > device could be a file as well. You would then use the fd to 
> > > > > > identify the
> > > > > > device within the group. And in a somewhat common case of a single 
> > > > > > device
> > > > > > within the group, you can even make opening the group optional.
> > > > > > Don't know if this fix I suggest makes sense at all but it's a real
> > > > > > problem all the same.
> > > > > 
> > > > > Unfortunately, exposing individual devices just confuses the ownership
> > > > > model we require for groups.  It would provide the illusion of being
> > > > > able to assign an individual device, without the reality of the
> > > > > grouping.  Groups are owned either by _a_ user or by the kernel, they
> > > > > can't be split across multiple users (at least not with any guarantees
> > > > > of isolation).  The current interface makes this clear.  Thanks,
> > > > > 
> > > > > Alex
> > > > 
> > > > So do users pass in group=/dev/vfio/1,host=0:3.0 then?
> > > 
> > > No, vfio syntax is -device vfio-pci,host=0:3.0, just like pci-assign.
> > > Qemu will figure out which group that device belongs to and "do the
> > > right thing".  If we add support for libvirt passing a groupfd, it will
> > > be mostly the same, just using scm_rights to get the groupfd instead of
> > > opening it directly.  Thanks,
> > > 
> > > Alex
> > 
> > Then how do you know which /dev/vfio/# to open?
> 
> This is all in the documentation patch... groups are exposed in sysfs
> in /sys/kernel/iommu_groups.  Each group has a unique number which is
> exposed as a directory.  Each group directory has a subdirectory called
> devices which links to all devices in the group.  Each device within a
> group as an iommu_group link back to the group directory.
> The /dev/vfio/# entry matches the group number in sysfs.  So it's all
> pretty easy.  Thanks,
> 
> Alex

So what's the problem to have devices in sysfs linked e.g. from
/sys/class/vfio/ ?  udev could create the nodes e.g. in
/dev/vfio/devices/.  User can then pass the device name and qemu can
figure out the group from sysfs.

-- 
MST



reply via email to

[Prev in Thread] Current Thread [Next in Thread]