[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v3 1/2] pcie: Add a simple PCIe ACS (Access Cont
From: |
Knut Omang |
Subject: |
Re: [Qemu-devel] [PATCH v3 1/2] pcie: Add a simple PCIe ACS (Access Control Services) helper function |
Date: |
Tue, 12 Feb 2019 20:52:06 +0100 |
User-agent: |
Evolution 3.30.4 (3.30.4-1.fc29) |
On Tue, 2019-02-12 at 10:14 -0700, Alex Williamson wrote:
> On Tue, 12 Feb 2019 17:25:46 +0100
> Knut Omang <address@hidden> wrote:
>
> > On Tue, 2019-02-12 at 08:59 -0700, Alex Williamson wrote:
> > > On Tue, 12 Feb 2019 09:07:43 +0100
> > > Knut Omang <address@hidden> wrote:
> > >
> > > > On Mon, 2019-02-11 at 16:09 -0700, Alex Williamson wrote:
> > > > > On Sun, 10 Feb 2019 07:52:59 +0100
> > > > > Knut Omang <address@hidden> wrote:
> > > > >
> > > > > > Add a helper function to add PCIe capability for Access Control
> > > > > > Services
> > > > > > (ACS)
> > > > > > ACS support in the associated root port is a prerequisite to be able
> > > > > > to
> > > > > > do
> > > > > > passthrough of individual functions of a device with VFIO
> > > > > > without Alex Williamson's pcie_acs_override kernel patch or similar
> > > > > > in the guest.
> > > > >
> > > > > This is still incorrect, the ACS override patch is only required for
> > > > > separating multifunction endpoints or multifunction root
> > > > > ports. Single
> > > > > function endpoints are assignable without ACS simply by placing them
> > > > > downstream of a single function root port or directly on the root
> > > > > complex.
> > > >
> > > > Hmm - that was the intended meaning of the comment, but I'll see if I
> > > > can
> > > > make
> > > > it more clear by saying it explicitly.
> > >
> > > "ACS support... is a prerequisite". Prerequisite: a thing that is
> > > required as a prior condition for something else to happen or exist.
> > >
> > > Assignment of individual functions exists today, as is, by using QEMU
> > > to define a PCIe topology that allows the desired grouping. The code
> > > here enables specific topologies, it is clearly not a prerequisite.
> > >
> > > > > > Endpoints may also implement an ACS capability, but with
> > > > > > limited features.
> > > > > >
> > > > > > Signed-off-by: Knut Omang <address@hidden>
> > > > > > ---
> > > > > > hw/pci/pcie.c | 29 +++++++++++++++++++++++++++++
> > > > > > include/hw/pci/pcie.h | 6 ++++++
> > > > > > include/hw/pci/pcie_regs.h | 4 ++++
> > > > > > 3 files changed, 39 insertions(+)
> > > > > >
> > > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > > > > > index 230478f..509632f 100644
> > > > > > --- a/hw/pci/pcie.c
> > > > > > +++ b/hw/pci/pcie.c
> > > > > > @@ -742,6 +742,14 @@ bool pcie_cap_is_arifwd_enabled(const PCIDevice
> > > > > > *dev)
> > > > > > PCI_EXP_DEVCTL2_ARI;
> > > > > > }
> > > > > >
> > > > > > +/* Access Control Services (ACS) */
> > > > > > +void pcie_cap_acs_reset(PCIDevice *dev)
> > > > > > +{
> > > > > > + if (dev->exp.acs_cap) {
> > > > > > + pci_set_word(dev->config + dev->exp.acs_cap + PCI_ACS_CTRL,
> > > > > > 0);
> > > > > > + }
> > > > > > +}
> > > > > > +
> > > > > > /******************************************************************
> > > > > > ****
> > > > > > ****
> > > > > > * pci express extended capability list management functions
> > > > > > * uint16_t ext_cap_id (16 bit)
> > > > > > @@ -906,3 +914,24 @@ void pcie_ats_init(PCIDevice *dev, uint16_t
> > > > > > offset)
> > > > > >
> > > > > > pci_set_word(dev->wmask + dev->exp.ats_cap + PCI_ATS_CTRL,
> > > > > > 0x800f);
> > > > > > }
> > > > > > +
> > > > > > +/* ACS (Access Control Services) */
> > > > > > +void pcie_acs_init(PCIDevice *dev, uint16_t offset)
> > > > > > +{
> > > > > > + bool is_pcie_slot = !!object_dynamic_cast(OBJECT(dev),
> > > > > > TYPE_PCIE_SLOT);
> > > > > > + uint16_t cap_bits = 0;
> > > > > > +
> > > > > > + pcie_add_capability(dev, PCI_EXT_CAP_ID_ACS, PCI_ACS_VER,
> > > > > > offset,
> > > > > > + PCI_ACS_SIZEOF);
> > > > > > + dev->exp.acs_cap = offset;
> > > > > > +
> > > > > > + if (is_pcie_slot) {
> > > > > > + /* Endpoints may also implement ACS, but these capabilities
> > > > > > are
> > > > > > */
> > > > > > + /* only valid for slots: */
> > > > >
> > > > > Not quite, SV, TB, and UF must not be implemented by endpoints, but RR
> > > > > and CR must be implemented by multifunction endpoints that support p2p
> > > > > if they provide an ACS capability.
> > > >
> > > > Hmm - are you ok with setting 0 here as I have done, just amending your
> > > > description to the comment? Then any future emulation that do support
> > > > p2p
> > > > would have to set the needed bits after calling the init function.
> > >
> > > The comment definitely needs work, but I don't know what to do about
> > > single function, non-SR-IOV capable devices calling into this.
> >
> > I agree - I have only been thinking about multifunction devices, I should
> > probably assert on not multifunction (or SR/IOV with my SR/IOV patch set)
> > to avoid misuse.
>
> Be sure to check both for (dev->cap_present &
> QEMU_PCI_CAP_MULTIFUNCTION) and PCI_FUNC(dev->devfn) since only
> function 0 is required to have the header bit set.
done thanks,
> > And what about someone passing a PF for SR/IOV through to a VM -
> > Has that use case showed up (yet) ?
> > I have so far only tested with my emulated SR/IOV.
>
> The SR-IOV capability is masked to the L1 guest. The issue is that an
> assigned SR-IOV PF could create VFs *on the host* which has security
> implications unless those VFs can be quarantined from general use by
> the host kernel. There have been patches and proposals, but none yet
> that sufficiently address this issue. Thanks,
Ah, yes, that makes sense.
It's getting late here, so will send v4 tomorrow..
Thanks!
Knut
> Alex
>
> > > Per the
> > > spec, these devices cannot implement an ACS capability at all, but will
> > > anyone care if they do? Maybe endpoints need a different init
> > > function. Maybe the capability ID needs to be obscured until
> > > additional functions or an SR-IOV capability are added. I'm not really
> > > concerned that this helper can only indicate lack of p2p between
> > > functions for endpoints, other than the comments being wrong.
> >
> > Ok, I think I am in the clear then with my v4 comment :-)
> >
> > > > After your previous comments on this, I had a look at Mellanox CX4 and
> > > > CX5
> > > > which
> > > > are the only devices I could find in the lab that are endpoints and
> > > > implement an
> > > > ACS capability, and neither seems to implement any extra capabilities:
> > > >
> > > > Capabilities: [230 v1] Access Control Services
> > > > ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir-
> > > > UpstreamFwd- EgressCtrl- DirectTrans-
> > > > ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir-
> > > > UpstreamFwd- EgressCtrl- DirectTrans-
> > > >
> > > > that was the reason for my choice of value here - after skimming through
> > > > the
> > > > spec (with my still very limited understanding of the details)
> > >
> > > If the above device is a multifunction device or SR-IOV capable device,
> > > this is the correct way to indicate there is no p2p between functions.
> > > This takes advantage of the wording in the spec that certain
> > > capabilities must be implemented by functions that support p2p
> > > traffic. Therefore if the capability is not implemented then p2p is
> > > not supported and requires no control.
> >
> > I see, that makes sense, and corresponds with the way the device I know
> > best
> > worked.
> >
> > > > > Linux therefore infers that if ACS
> > > > > is supported by an endpoint and RR and CR are not implemented, the
> > > > > device does not support p2p.
> > > >
> > > > Interesting - I thought the CX5 supported p2p, but I have not kept up
> > > > with
> > > > what
> > > > happens on the RDMA list on that front.
> > >
> > > ACS can only indicate p2p capabilities within the device reporting it.
> > > The above example indicates that the device does not do p2p internally
> > > between functions. It does not preclude p2p between devices or even
> > > functions, but it does require that the p2p traffic occurs on the link
> > > and therefore allows routing to be managed through the ACS capabilities
> > > of the interconnect components.
> > >
> > > For instance if we have a multifunction endpoint with the above
> > > capability, we know that a DMA from function 1 targeting an address
> > > range on function 0 will not complete the DMA internally, it will go
> > > through the egress port to the downstream port above the endpoint. The
> > > ACS capability on that downstream port then controls whether that DMA
> > > request is redirected back to the ingress port of the endpoint or
> > > forwarded upstream. If a redirect occurs here then the IOMMU is not
> > > fully in control of the IOVA space of the device and the IOMMU group is
> > > extended to indicate this.
> >
> > Thanks, this definitely clarifies!
> >
> > > > > We could just say that nothing supports
> > > > > p2p yet, but single function endpoints (except those implementing
> > > > > SR-IOV) must not implement an ACS capability per the spec, which could
> > > > > be difficult to exclude since the multifunction bit is handled
> > > > > separately from the device model.
> > > >
> > > > Hmm - the older SR/IOV devices I know of, some Intel Ethernet devices
> > > > and
> > > > any of
> > > > the older Mellanox devices, and our own cancelled Infiniband device, all
> > > > seem
> > > > not to implement ACS?
> > >
> > > Linux effectively assumes that there is no p2p between VFs or between
> > > the PF and VF, so the result is that a multifunction endpoint without
> > > ACS supporting SR-IOV would have the PFs grouped together, but the VFs
> > > would be grouped individually, assuming the upstream topology provides
> > > isolation. Intel has since provided verification that nearly none of
> > > their multi-port NICs implement internal p2p and they now use quirks to
> > > provide the ACS equivalent information for grouping for those older
> > > devices. Newer Intel NICs implement ACS capabilities as in your
> > > example above to indicate this without the need for quirks. Thanks,
> >
> > Yes, so for devices like the CX4 and CX5 which exposes more than one PF,
> > they need the ACS capability to indicate that individual PFs are isolated
> > from
> > each other, while the single PF SR/IOV devices can do without.
> >
> > Thanks for the excellent explanations, this makes it all much clearer!
> > Knut
> >
> > > Alex
> > >
> > > > > Also comment style:
> > > > > https://git.qemu.org/?p=qemu.git;a=blob;f=CODING_STYLE;#l127
> > > >
> > > > I see - will fix,
> > > >
> > > > > > + cap_bits =
> > > > > > + PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR |
> > > > > > PCI_ACS_UF;
> > > > > > + }
> > > > > > +
> > > > > > + pci_set_word(dev->config + offset + PCI_ACS_CAP, cap_bits);
> > > > > > + pci_set_word(dev->wmask + offset + PCI_ACS_CTRL, cap_bits);
> > > > > > +}
> > > > > > diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
> > > > > > index 5b82a0d..4c40711 100644
> > > > > > --- a/include/hw/pci/pcie.h
> > > > > > +++ b/include/hw/pci/pcie.h
> > > > > > @@ -79,6 +79,9 @@ struct PCIExpressDevice {
> > > > > >
> > > > > > /* Offset of ATS capability in config space */
> > > > > > uint16_t ats_cap;
> > > > > > +
> > > > > > + /* ACS */
> > > > > > + uint16_t acs_cap;
> > > > > > };
> > > > > >
> > > > > > #define COMPAT_PROP_PCP "power_controller_present"
> > > > > > @@ -116,6 +119,8 @@ void pcie_cap_flr_init(PCIDevice *dev);
> > > > > > void pcie_cap_flr_write_config(PCIDevice *dev,
> > > > > > uint32_t addr, uint32_t val, int len);
> > > > > >
> > > > > > +void pcie_cap_acs_reset(PCIDevice *dev);
> > > > > > +
> > > > > > /* ARI forwarding capability and control */
> > > > > > void pcie_cap_arifwd_init(PCIDevice *dev);
> > > > > > void pcie_cap_arifwd_reset(PCIDevice *dev);
> > > > > > @@ -129,6 +134,7 @@ void pcie_add_capability(PCIDevice *dev,
> > > > > > void pcie_sync_bridge_lnk(PCIDevice *dev);
> > > > > >
> > > > > > void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t
> > > > > > nextfn);
> > > > > > +void pcie_acs_init(PCIDevice *dev, uint16_t offset);
> > > > >
> > > > > nit, why aren't the reset and init declarations together?
> > > >
> > > > Good point, will fix.
> > > >
> > > > Thanks!
> > > > Knut
> > > >
> > > > > > void pcie_dev_ser_num_init(PCIDevice *dev, uint16_t offset,
> > > > > > uint64_t
> > > > > > ser_num);
> > > > > > void pcie_ats_init(PCIDevice *dev, uint16_t offset);
> > > > > >
> > > > > > diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h
> > > > > > index ad4e780..1db86b0 100644
> > > > > > --- a/include/hw/pci/pcie_regs.h
> > > > > > +++ b/include/hw/pci/pcie_regs.h
> > > > > > @@ -175,4 +175,8 @@ typedef enum PCIExpLinkWidth {
> > > > > > PCI_ERR_COR_INTERNAL
> > > > > > > \
> > > > > > PCI_ERR_COR_HL_OVERFLOW)
> > > > > >
> > > > > > +/* ACS */
> > > > > > +#define PCI_ACS_VER 0x1
> > > > > > +#define PCI_ACS_SIZEOF 8
> > > > > > +
> > > > > > #endif /* QEMU_PCIE_REGS_H */
- [Qemu-devel] [PATCH v3 2/2] gen_pcie_root_port: Add ACS (Access Control Services) capability, (continued)
- [Qemu-devel] [PATCH v3 2/2] gen_pcie_root_port: Add ACS (Access Control Services) capability, Knut Omang, 2019/02/10
- [Qemu-devel] [PATCH v3 1/2] pcie: Add a simple PCIe ACS (Access Control Services) helper function, Knut Omang, 2019/02/10
- Re: [Qemu-devel] [PATCH v3 1/2] pcie: Add a simple PCIe ACS (Access Control Services) helper function, Marcel Apfelbaum, 2019/02/11
- Re: [Qemu-devel] [PATCH v3 1/2] pcie: Add a simple PCIe ACS (Access Control Services) helper function, Alex Williamson, 2019/02/11
- Re: [Qemu-devel] [PATCH v3 1/2] pcie: Add a simple PCIe ACS (Access Control Services) helper function, Knut Omang, 2019/02/12
- Re: [Qemu-devel] [PATCH v3 1/2] pcie: Add a simple PCIe ACS (Access Control Services) helper function, Knut Omang, 2019/02/12
- Re: [Qemu-devel] [PATCH v3 1/2] pcie: Add a simple PCIe ACS (Access Control Services) helper function, Alex Williamson, 2019/02/12
- Re: [Qemu-devel] [PATCH v3 1/2] pcie: Add a simple PCIe ACS (Access Control Services) helper function, Knut Omang, 2019/02/12
- Re: [Qemu-devel] [PATCH v3 1/2] pcie: Add a simple PCIe ACS (Access Control Services) helper function, Alex Williamson, 2019/02/12
- Re: [Qemu-devel] [PATCH v3 1/2] pcie: Add a simple PCIe ACS (Access Control Services) helper function,
Knut Omang <=