qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 1/1] KVM s390 pci infrastructure modelling


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] [PATCH v2 1/1] KVM s390 pci infrastructure modelling
Date: Wed, 1 Jul 2015 13:57:44 +0200

On Wed, Jul 01, 2015 at 07:46:01PM +0800, Hong Bo Li wrote:
> 
> 
> On 7/1/2015 19:23, Michael S. Tsirkin wrote:
> >On Wed, Jul 01, 2015 at 07:11:38PM +0800, Hong Bo Li wrote:
> >>
> >>On 7/1/2015 18:36, Michael S. Tsirkin wrote:
> >>>On Wed, Jul 01, 2015 at 06:04:24PM +0800, Hong Bo Li wrote:
> >>>>On 7/1/2015 17:22, Michael S. Tsirkin wrote:
> >>>>>On Wed, Jul 01, 2015 at 05:13:11PM +0800, Hong Bo Li wrote:
> >>>>>>On 7/1/2015 16:05, Michael S. Tsirkin wrote:
> >>>>>>>On Wed, Jul 01, 2015 at 03:56:25PM +0800, Hong Bo Li wrote:
> >>>>>>>>On 7/1/2015 14:22, Michael S. Tsirkin wrote:
> >>>>>>>>>On Tue, Jun 30, 2015 at 02:16:59PM +0800, Hong Bo Li wrote:
> >>>>>>>>>>On 6/29/2015 18:01, Michael S. Tsirkin wrote:
> >>>>>>>>>>>On Mon, Jun 29, 2015 at 05:24:53PM +0800, Hong Bo Li wrote:
> >>>>>>>>>>>>This patch introduce a new facility(and bus)
> >>>>>>>>>>>>to hold devices representing information actually
> >>>>>>>>>>>>provided by s390 firmware and I/O configuration.
> >>>>>>>>>>>>usage example:
> >>>>>>>>>>>>-device s390-pcihost
> >>>>>>>>>>>>-device vfio-pci,host=0000:00:00.0,id=vpci1
> >>>>>>>>>>>>-device zpci,fid=2,uid=5,pci_id=vpci1,id=zpci1
> >>>>>>>>>>>>
> >>>>>>>>>>>>The first line will create a s390 pci host bridge
> >>>>>>>>>>>>and init the root bus. The second line will create
> >>>>>>>>>>>>a standard vfio pci device, and attach it to the
> >>>>>>>>>>>>root bus. These are similiar to the standard process
> >>>>>>>>>>>>to define a pci device on other platform.
> >>>>>>>>>>>>
> >>>>>>>>>>>>The third line will create a s390 pci device to
> >>>>>>>>>>>>store s390 specific information, and references
> >>>>>>>>>>>>the corresponding vfio pci device via device id.
> >>>>>>>>>>>>We create a s390 pci facility bus to hold all the
> >>>>>>>>>>>>zpci devices.
> >>>>>>>>>>>>
> >>>>>>>>>>>>Signed-off-by: Hong Bo Li <address@hidden>
> >>>>>>>>>>>It's mostly up to s390 maintainers, but I'd like to note
> >>>>>>>>>>>one thing below
> >>>>>>>>>>>
> >>>>>>>>>>>>---
> >>>>>>>>>>>>  hw/s390x/s390-pci-bus.c    | 314 
> >>>>>>>>>>>> +++++++++++++++++++++++++++++++++------------
> >>>>>>>>>>>>  hw/s390x/s390-pci-bus.h    |  48 ++++++-
> >>>>>>>>>>>>  hw/s390x/s390-pci-inst.c   |   4 +-
> >>>>>>>>>>>>  hw/s390x/s390-virtio-ccw.c |   5 +-
> >>>>>>>>>>>>  4 files changed, 283 insertions(+), 88 deletions(-)
> >>>>>>>>>>>>
> >>>>>>>>>>>>diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> >>>>>>>>>>>>index 560b66a..d5e7b2e 100644
> >>>>>>>>>>>>--- a/hw/s390x/s390-pci-bus.c
> >>>>>>>>>>>>+++ b/hw/s390x/s390-pci-bus.c
> >>>>>>>>>>>>@@ -32,8 +32,8 @@ int chsc_sei_nt2_get_event(void *res)
> >>>>>>>>>>>>      PciCcdfErr *eccdf;
> >>>>>>>>>>>>      int rc = 1;
> >>>>>>>>>>>>      SeiContainer *sei_cont;
> >>>>>>>>>>>>-    S390pciState *s = S390_PCI_HOST_BRIDGE(
> >>>>>>>>>>>>-        object_resolve_path(TYPE_S390_PCI_HOST_BRIDGE, NULL));
> >>>>>>>>>>>>+    S390PCIFacility *s = S390_PCI_FACILITY(
> >>>>>>>>>>>>+        object_resolve_path(TYPE_S390_PCI_FACILITY, NULL));
> >>>>>>>>>>>>      if (!s) {
> >>>>>>>>>>>>          return rc;
> >>>>>>>>>>>>@@ -72,8 +72,8 @@ int chsc_sei_nt2_get_event(void *res)
> >>>>>>>>>>>>  int chsc_sei_nt2_have_event(void)
> >>>>>>>>>>>>  {
> >>>>>>>>>>>>-    S390pciState *s = S390_PCI_HOST_BRIDGE(
> >>>>>>>>>>>>-        object_resolve_path(TYPE_S390_PCI_HOST_BRIDGE, NULL));
> >>>>>>>>>>>>+    S390PCIFacility *s = S390_PCI_FACILITY(
> >>>>>>>>>>>>+        object_resolve_path(TYPE_S390_PCI_FACILITY, NULL));
> >>>>>>>>>>>>      if (!s) {
> >>>>>>>>>>>>          return 0;
> >>>>>>>>>>>>@@ -82,20 +82,32 @@ int chsc_sei_nt2_have_event(void)
> >>>>>>>>>>>>      return !QTAILQ_EMPTY(&s->pending_sei);
> >>>>>>>>>>>>  }
> >>>>>>>>>>>>+void s390_pci_device_enable(S390PCIBusDevice *zpci)
> >>>>>>>>>>>>+{
> >>>>>>>>>>>>+    zpci->fh = zpci->fh | 1 << ENABLE_BIT_OFFSET;
> >>>>>>>>>>>>+}
> >>>>>>>>>>>>+
> >>>>>>>>>>>>+void s390_pci_device_disable(S390PCIBusDevice *zpci)
> >>>>>>>>>>>>+{
> >>>>>>>>>>>>+    zpci->fh = zpci->fh & ~(1 << ENABLE_BIT_OFFSET);
> >>>>>>>>>>>>+    if (zpci->is_unplugged)
> >>>>>>>>>>>>+        object_unparent(OBJECT(zpci));
> >>>>>>>>>>>>+}
> >>>>>>>>>>>>+
> >>>>>>>>>>>>  S390PCIBusDevice *s390_pci_find_dev_by_fid(uint32_t fid)
> >>>>>>>>>>>>  {
> >>>>>>>>>>>>      S390PCIBusDevice *pbdev;
> >>>>>>>>>>>>-    int i;
> >>>>>>>>>>>>-    S390pciState *s = S390_PCI_HOST_BRIDGE(
> >>>>>>>>>>>>-        object_resolve_path(TYPE_S390_PCI_HOST_BRIDGE, NULL));
> >>>>>>>>>>>>+    BusChild *kid;
> >>>>>>>>>>>>+    S390PCIFacility *s = S390_PCI_FACILITY(
> >>>>>>>>>>>>+        object_resolve_path(TYPE_S390_PCI_FACILITY, NULL));
> >>>>>>>>>>>>      if (!s) {
> >>>>>>>>>>>>          return NULL;
> >>>>>>>>>>>>      }
> >>>>>>>>>>>>-    for (i = 0; i < PCI_SLOT_MAX; i++) {
> >>>>>>>>>>>>-        pbdev = &s->pbdev[i];
> >>>>>>>>>>>>-        if ((pbdev->fh != 0) && (pbdev->fid == fid)) {
> >>>>>>>>>>>>+    QTAILQ_FOREACH(kid, &s->fbus->qbus.children, sibling) {
> >>>>>>>>>>>>+        pbdev = (S390PCIBusDevice *)kid->child;
> >>>>>>>>>>>>+        if (pbdev->fid == fid) {
> >>>>>>>>>>>>              return pbdev;
> >>>>>>>>>>>>          }
> >>>>>>>>>>>>      }
> >>>>>>>>>>>>@@ -126,39 +138,24 @@ void s390_pci_sclp_configure(int configure, 
> >>>>>>>>>>>>SCCB *sccb)
> >>>>>>>>>>>>      return;
> >>>>>>>>>>>>  }
> >>>>>>>>>>>>-static uint32_t s390_pci_get_pfid(PCIDevice *pdev)
> >>>>>>>>>>>>-{
> >>>>>>>>>>>>-    return PCI_SLOT(pdev->devfn);
> >>>>>>>>>>>>-}
> >>>>>>>>>>>>-
> >>>>>>>>>>>>-static uint32_t s390_pci_get_pfh(PCIDevice *pdev)
> >>>>>>>>>>>>-{
> >>>>>>>>>>>>-    return PCI_SLOT(pdev->devfn) | FH_VIRT;
> >>>>>>>>>>>>-}
> >>>>>>>>>>>>-
> >>>>>>>>>>>>  S390PCIBusDevice *s390_pci_find_dev_by_idx(uint32_t idx)
> >>>>>>>>>>>>  {
> >>>>>>>>>>>>      S390PCIBusDevice *pbdev;
> >>>>>>>>>>>>-    int i;
> >>>>>>>>>>>>-    int j = 0;
> >>>>>>>>>>>>-    S390pciState *s = S390_PCI_HOST_BRIDGE(
> >>>>>>>>>>>>-        object_resolve_path(TYPE_S390_PCI_HOST_BRIDGE, NULL));
> >>>>>>>>>>>>+    BusChild *kid;
> >>>>>>>>>>>>+    int i = 0;
> >>>>>>>>>>>>+    S390PCIFacility *s = S390_PCI_FACILITY(
> >>>>>>>>>>>>+        object_resolve_path(TYPE_S390_PCI_FACILITY, NULL));
> >>>>>>>>>>>>      if (!s) {
> >>>>>>>>>>>>          return NULL;
> >>>>>>>>>>>>      }
> >>>>>>>>>>>>-    for (i = 0; i < PCI_SLOT_MAX; i++) {
> >>>>>>>>>>>>-        pbdev = &s->pbdev[i];
> >>>>>>>>>>>>-
> >>>>>>>>>>>>-        if (pbdev->fh == 0) {
> >>>>>>>>>>>>-            continue;
> >>>>>>>>>>>>-        }
> >>>>>>>>>>>>-
> >>>>>>>>>>>>-        if (j == idx) {
> >>>>>>>>>>>>+    QTAILQ_FOREACH(kid, &s->fbus->qbus.children, sibling) {
> >>>>>>>>>>>>+        pbdev = (S390PCIBusDevice *)kid->child;
> >>>>>>>>>>>>+        if (i == idx) {
> >>>>>>>>>>>>              return pbdev;
> >>>>>>>>>>>>          }
> >>>>>>>>>>>>-        j++;
> >>>>>>>>>>>>+        i++;
> >>>>>>>>>>>>      }
> >>>>>>>>>>>>      return NULL;
> >>>>>>>>>>>This relies on the order of children on the qbus, that's wrong I 
> >>>>>>>>>>>think.
> >>>>>>>>>>>Generally I'm not sure why do you convert all slot lookups to child
> >>>>>>>>>>>lookups: more code to achieve the same effect?
> >>>>>>>>>>Thank you Michael.
> >>>>>>>>>>I do the change due to two reasons:
> >>>>>>>>>>1. The old implement only supports one s390 pci root bus, and 
> >>>>>>>>>>32(PCI_SLOT_MAX)
> >>>>>>>>>>slots at most. So when it comes to multiple s390 pci root buses, 
> >>>>>>>>>>the old code
> >>>>>>>>>>does not work.
> >>>>>>>>>>2. Now the zpci device "S390PCIBusDevice" is only a structure to 
> >>>>>>>>>>store
> >>>>>>>>>>s390 specific information, so we can attach all the zpci devices to 
> >>>>>>>>>>a
> >>>>>>>>>>s390 pci facility bus. Since these zpci device has no relation with 
> >>>>>>>>>>the "slot",
> >>>>>>>>>>so the order of them does not matter.
> >>>>>>>>>But you make this order guest-visible which seems wrong.
> >>>>>>>>>
> >>>>>>>>The guest uses a s390 specific "list pci" instruction to get all the 
> >>>>>>>>zpci
> >>>>>>>>devices, and will
> >>>>>>>>create a root s390 pci bus for each device.  So the order has no 
> >>>>>>>>relation
> >>>>>>>>with the pci
> >>>>>>>>topology on guest.
> >>>>>>>>
> >>>>>>>>If we assign  too many zpci devices to one guest, the "list pci" 
> >>>>>>>>instruction
> >>>>>>>>will use a
> >>>>>>>>resume token to get all the zpci devices. For example, first time we 
> >>>>>>>>return
> >>>>>>>>32 zpci
> >>>>>>>>devices to guest. Next time we'll return another 32 zpci devices. The 
> >>>>>>>>resume
> >>>>>>>>token
> >>>>>>>>is used to store the beginning of zpci devices that will be returned 
> >>>>>>>>to
> >>>>>>>>guest at next time.
> >>>>>>>>
> >>>>>>>>So, if we change the order of the zpci device on s390 facility bus, 
> >>>>>>>>it may
> >>>>>>>>change the
> >>>>>>>>"batch" in which this device be returned to guest. But this will not 
> >>>>>>>>change
> >>>>>>>>the  pci
> >>>>>>>>topology on guest.
> >>>>>>>Yes but that's still guest visible, and will break
> >>>>>>>for example if guest is migrated between qemu instances
> >>>>>>>where list order is different precisely when
> >>>>>>>it's enumerating the bus.
> >>>>>>>
> >>>>>>Yes, and the list order is not the only s390 specific information that
> >>>>>>exposed to
> >>>>>>guest. Besides that,  we need to migrate all other zpci information. For
> >>>>>>now,
> >>>>>>we have no plan to support zpci migration yet.
> >>>>>BTW how will hotplug work? If it happens while guest
> >>>>>enumerates the bus the naturally all index values
> >>>>>become invalid.
> >>>>The list zpci only happen when the guest doing pci_base_init() for s390.
> >>>>At that moment,  hotplug does not work yet.
> >>>You can't prevent this: user can request hotplug at this time.
> >>>
> >>>>And assume we have
> >>>>that case, we still have the index issue even when scan standard pci
> >>>>bus. Please see my following words.
> >>>>
> >>>>>Just don't expose internal qdev data structures to guest.
> >>>>>It's not by chance that we don't have a look up by index
> >>>>>capability, it's an attempt to enfoce sane usage.
> >>>>>You are misusing the API with your hack.
> >>>>The resume token of list zpci is indeed an index of iteration:(
> >>>>
> >>>>>PCI has standard ways to enumerate the bus, maybe you
> >>>>>should emulate it.  Or find some other way that works.
> >>>>>The idea to poke at s->fbus->qbus and count things there
> >>>>>is a bad one.
> >>>>>
> >>>>I can define multiple zpci buses, and attach zpci device to a slot of a 
> >>>>root
> >>>>bus.
> >>>>Then I need to add a api to the common pci code to do the scan of all the
> >>>>pci host bridges. And in this way, it still has the index issue. I need to
> >>>>scan
> >>>>from the first bus to count the index. So any suggestion from you?
> >>>OK, I looked at arch/s390/pci/pci.c.
> >>>First of all, it seems to run the regular PCI thing on bridges.
> >>>
> >>>         zdev->bus = pci_scan_root_bus(NULL, ZPCI_BUS_NR, &pci_root_ops,
> >>>                                       zdev, &resources);
> >>At this moment, the guest has got all the zpci devices through clp list zpci
> >>instruction. For each device, in the pci_scan_root_bus(), it will create
> >>a root bus. So for s390, we get pci devices first, then create a new root 
> >>bus
> >>for it.
> >I don't see this in guest code.
> >
> >I looked at pci_scan_root_bus and it's completely generic.
> >It sets up the bus:
> >         b = pci_create_root_bus(parent, bus, ops, sysdata, resources);
> >
> >then it scans it:
> >         max = pci_scan_child_bus(b);
> >
> >
> >that one does
> >         /* Go find them, Rover! */
> >         for (devfn = 0; devfn < 0x100; devfn += 8)
> >                 pci_scan_slot(bus, devfn);
> >
> >next
> >         dev = pci_scan_single_device(bus, devfn);
> >
> >and so on. Eventually you get
> >         if (!pci_bus_read_dev_vendor_id(bus, devfn, &l, 60*1000))
> >                 return NULL;
> >
> >and that one does the clp thing using zpci_cfg_load.
> >
> pci_base_init()-> clp_scan_pci_devices():
>     rc = clp_list_pci(rrb, __clp_add);
> In this function, there is a while loop to get all the zpci devices by means
> of
> resume token(index). And for each device,
>     __clp_add()-> clp_add_pci_device();
> In clp_add_pci_device(), we use the zpci information to create a struct
> zpci_dev zdev.
> Then zpci_create_device()->zpci_scan_bus()->pci_scan_root_bus()
>     zdev->bus = pci_scan_root_bus(NULL, ZPCI_BUS_NR, &pci_root_ops,
>                       zdev, &resources);
> So, you see, each zdev has its own root bus. And there is no child bus under
> that root bus.

Right - zdev *is* the root. But there are pci devices hanging off it.

So why not model it like this?

vfio should attach to zdev, zdev is the pci host.

Also, you can stick a pci to pci bridge under the root, and
everything will just work.





> >
> >
> >>>so to me, it looks like there's no need to expose
> >>>non-root buses through special means.
> >>>
> >>>What to do for root buses is a different question but again,
> >>>you definitely do not want to rely on the order of things
> >>>on that linked list.
> >>>The simplest thing is to ask user to give them unique
> >>>numbers, or find some stable way to sort them that
> >>>does not rely on order of initialization (e.g. device IDs?).
> >>>
> >>>But again, this only works ok for root buses.
> >>>
> >>Basically, it does not exposed the buses to guest, it exposed an index
> >>to guest.
> >>Here is the process to get all the zpci device for a guest.
> >>For example: we have 10 zpci devices, and the batch size for list zpci
> >>instruction is 4.
> >>First, qemu will return devices 0-3, index of list zpci is 0
> >>Second, qemu will return device 4-7, index of list zpci is 4
> >>Third, qemu will return device 8-9, index of list zpci is 8
> >>We have device id, but list zpci does not use that as a flag to get
> >>next batch, it use an index instead.
> >>This process is defined by s390 arch, we can't change it.
> >>So no matter how we organize zpci devices in qemu, slot or link list.
> >>We could not get rid of the index issue.
> >>
> >>How about I add a flag to identify whether the link list
> >>is valid or not. When a hotplug/unplug event occurred, I will
> >>reset the index, and make the guest refetch the zpci devices
> >>from the beginning.
> >>
> >>
> >>
> >>
> >You should just use something stable for IDs.
> >And avoid doing it for anything that isn't a root or maybe a bridge
> >since it'll just cause everyone maintainance problems down the road.
> >
> The list zpci instruction is defined by arch, not a software thing, I could
> not
> change it to use a ID instead...



reply via email to

[Prev in Thread] Current Thread [Next in Thread]