qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v4 1/2] pcie: Add a simple PCIe ACS (Access Cont


From: Alex Williamson
Subject: Re: [Qemu-devel] [PATCH v4 1/2] pcie: Add a simple PCIe ACS (Access Control Services) helper function
Date: Thu, 14 Feb 2019 08:51:31 -0700

On Thu, 14 Feb 2019 08:07:33 +0100
Knut Omang <address@hidden> wrote:

> On Wed, 2019-02-13 at 12:13 -0700, Alex Williamson wrote:
> > On Wed, 13 Feb 2019 10:29:58 +0100
> > Knut Omang <address@hidden> wrote:
> >   
> > > Add a helper function to add PCIe capability for Access Control Services
> > > (ACS)  
> > 
> > This is redundant to the commit title.
> >   
> > > ACS support in the associated root port is needed to pass
> > > through individual functions of a device to different VMs with VFIO
> > > without Alex Williamson's pcie_acs_override kernel patch or similar
> > > in the guest.  
> > 
> > This is overly subtle, to work backwards that individual functions
> > (plural!) of a device (singular!) must imply a multifunction endpoint
> > in the same hierarchy split to different L2 VMs.  Perhaps I only
> > finally realized this subtly on v4.
> >   
> > > Single function devices, or multifunction devices
> > > that all goes to the same VM works fine even without ACS, as VFIO
> > > will avoid putting the root port itself into the IOMMU group
> > > even without ACS support in the port.  
> > 
> > Also confusing and incorrectly states that a) VFIO is responsible for
> > IOMMU grouping, it's not, and b) that the root port would not be
> > included in such a group, it would.  The latter was I thought the
> > impetus for this series.  
> 
> that wasn't the intention but I can see that it looks that way now
> 
> > > Multifunction endpoints may also implement an ACS capability,
> > > only on function 0, and with more limited features.  
> > 
> > "only on function 0" is incorrect, each function of a multifunction
> > device should (not must) implement an ACS capability if any of them do.
> > 
> > Can't we just say something like:
> > 
> > "Implementing an ACS capability on downstream ports and multifuction
> > endpoints indicates isolation and IOMMU visibility to a finer
> > granularity thereby creating smaller IOMMU groups in the guest and thus
> > more flexibility in assigning endpoints to guest userspace or an L2
> > guest."  
> 
> sure - will use this - and remove my confusing attempt to 
> credit to your override patch and VFIO :)
> 
> > (I avoided including SR-IOV with multifunction since that's not
> > implemented here)  
> 
> I agree
> 
> > > Signed-off-by: Knut Omang <address@hidden>
> > > ---
> > >  hw/pci/pcie.c              | 39 +++++++++++++++++++++++++++++++++++++++-
> > >  include/hw/pci/pcie.h      |  6 ++++++-
> > >  include/hw/pci/pcie_regs.h |  4 ++++-
> > >  3 files changed, 49 insertions(+)
> > > 
> > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > > index 230478f..6e87994 100644
> > > --- a/hw/pci/pcie.c
> > > +++ b/hw/pci/pcie.c
> > > @@ -906,3 +906,42 @@ void pcie_ats_init(PCIDevice *dev, uint16_t offset)
> > >  
> > >      pci_set_word(dev->wmask + dev->exp.ats_cap + PCI_ATS_CTRL, 0x800f);
> > >  }
> > > +
> > > +/* ACS (Access Control Services) */
> > > +void pcie_acs_init(PCIDevice *dev, uint16_t offset)
> > > +{
> > > +    bool is_pcie_slot = !!object_dynamic_cast(OBJECT(dev), 
> > > TYPE_PCIE_SLOT);  
> > 
> > Perhaps we should be using pci_is_express_downstream_port().  
> 
> oh - yes - I forgot that we need to look in pci.h for those kind of 
> helpers..
> 
> > > +    uint16_t cap_bits = 0;
> > > +
> > > +    /*
> > > +     * For endpoints, only multifunction devices may have an
> > > +     * ACS capability, and only on function 0:  
> > 
> > Incorrect
> >   
> > > +     */
> > > +    assert(is_pcie_slot ||
> > > +           ((dev->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) &&
> > > +            PCI_FUNC(dev->devfn)));  
> > 
> > The second test should be:
> > 
> > ((dev->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) ||
> >  PCI_FUNC(dev->devfn))
> > 
> > If the function number is non-zero, then it's clearly a multifunction
> > device, the multifunction capability is only required on function
> > zero.  Just as in my previous example, an ACS capability can only
> > describe/control the DMA flow of the function implementing it, nothing
> > in the spec that I can see imposes function zero's DMA flow on the
> > other functions.  
> 
> Ah - of course - that makes sense - 
> was thinking too complicated here, and also my comment didn't match
> the code at all..
> 
> > There's also a gap here that function zero can set the multifunction
> > capability, but there may be no secondary devices defined.  Not that
> > we necessarily need to resolve this, but it's a nuance of allowing
> > arbitrary multifunction configurations as QEMU does.  
> 
> Yes, in the SR/IOV case, at least as I have implemented it in QEMU, 
> with one PF that would be the default - as no VFs are defined at reset, 
> there's only one function, but it still need to be multifunction 
> for QEMU to accept more functions appearing later.
> 
> > > +
> > > +    pcie_add_capability(dev, PCI_EXT_CAP_ID_ACS, PCI_ACS_VER, offset,
> > > +                        PCI_ACS_SIZEOF);
> > > +    dev->exp.acs_cap = offset;
> > > +
> > > +    if (is_pcie_slot) {
> > > +        /*
> > > +         * Endpoints may also implement ACS, and optionally RR and CR,
> > > +         * if they want to support p2p, but only slots may
> > > +         * implement SV, TB or UF:  
> > 
> > Careful using "may" with spec references.  
> 
> :-D
> 
> > "Downstream ports must implement SV, TB, RR, CR, and UF (with caveats
> > on the latter three that we ignore for simplicity).  Endpoints may also
> > implement a subset of ACS capabilities, but these are optional if the
> > endpoint does not support peer-to-peer between functions and thus
> > omitted here."  
> 
> Thanks - I'll put that in instead
> 
> > > +         */
> > > +        cap_bits =
> > > +            PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR | 
> > > PCI_ACS_UF;  
> > 
> > PCI_ACS_DT has similar "must be implemented" requirements to RR, RC,
> > and UF, why is it not included?  For all of these "caveat" ones there
> > are conditions which can make it optional for root ports, but required
> > for switch downstream ports, so it seems reasonable that we include
> > both since that's what our is_pci_slot() test covers.  Thanks,  
> 
> That was because my "reference" root ports don't not implement it, taking the
> conservative approach:
> 
> 00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root 
> Port 7 (rev 22) (prog-if 00 [Normal decode])
>        ...
>        Capabilities: [150 v1] Access Control Services
>                 ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ 
> UpstreamFwd+ EgressCtrl- DirectTrans-
>                 ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ 
> UpstreamFwd+ EgressCtrl- DirectTrans-
> 
> In fact, all gens of servers I have access to supports just the same cap bits 
> in
> their root ports, in order of release date. The latest gen even have some root
> ports without an ACS capability.
> 
> 00:02.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express Root 
> Port 2a (rev 07)
> 00:01.0 PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D 
> PCI Express Root Port 1 (rev 01) (prog-if 00 [Normal decode])
> 00:03.0 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI 
> Express Root Port 3 (rev 02) (prog-if 00 [Normal decode])
> 00:02.0 PCI bridge: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 PCI 
> Express Root Port 2a (rev 04) (prog-if 00 [Normal decode])
> 17:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A (rev 
> 04) (prog-if 00 [Normal decode])
> 
> All of these have DirectTrans- in their ACSCap.
> 
> [For reference, the one without ACS cap, in the same server as 17:00.0 above:
> 
> 00:1c.0 PCI bridge: Intel Corporation Lewisburg PCI Express Root Port #1 (rev 
> f9) (prog-if 00 [Normal decode])
> ]
> 
> Do you prefer that we set DT as default anyway, or should we stay within 
> "known
> territory" for the OSes, at least for now?

Per the spec:

  ACS Direct Translated P2P: must be implemented by Root Ports that
  support Address Translation Services (ATS) and also support
  peer-to-peer traffic with other Root Ports; must be implemented by
  Switch Downstream Ports.

The caveats for root ports here are why your hardware is potentially
spec compliant without supporting DT.  There are no caveats for switch
downstream ports, therefore it would not be spec compliant to exclude
it.  I think your options are either to exclude switch downstream ports
from the function or to set DT.  Thanks,

Alex



reply via email to

[Prev in Thread] Current Thread [Next in Thread]