[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC PATCH] pci: Use PCI aliases when determining devic
From: |
Peter Xu |
Subject: |
Re: [Qemu-devel] [RFC PATCH] pci: Use PCI aliases when determining device IOMMU address space |
Date: |
Thu, 28 Mar 2019 10:44:13 +0800 |
User-agent: |
Mutt/1.10.1 (2018-07-13) |
On Wed, Mar 27, 2019 at 10:37:09AM -0600, Alex Williamson wrote:
> On Wed, 27 Mar 2019 14:25:00 +0800
> Peter Xu <address@hidden> wrote:
>
> > On Tue, Mar 26, 2019 at 04:55:19PM -0600, Alex Williamson wrote:
> > > Conventional PCI buses pre-date requester IDs. An IOMMU cannot
> > > distinguish by devfn & bus between devices in a conventional PCI
> > > topology and therefore we cannot assign them separate AddressSpaces.
> > > By taking this requester ID aliasing into account, QEMU better matches
> > > the bare metal behavior and restrictions, and enables shared
> > > AddressSpace configurations that are otherwise not possible with
> > > guest IOMMU support.
> > >
> > > For the latter case, given any example where an IOMMU group on the
> > > host includes multiple devices:
> > >
> > > $ ls /sys/kernel/iommu_groups/1/devices/
> > > 0000:00:01.0 0000:01:00.0 0000:01:00.1
> >
> > [1]
> >
> > >
> > > If we incorporate a vIOMMU into the VM configuration, we're restricted
> > > that we can only assign one of the endpoints to the guest because a
> > > second endpoint will attempt to use a different AddressSpace. VFIO
> > > only supports IOMMU group level granularity at the container level,
> > > preventing this second endpoint from being assigned:
> > >
> > > qemu-system-x86_64 -machine q35... \
> > > -device intel-iommu,intremap=on \
> > > -device pcie-root-port,addr=1e.0,id=pcie.1 \
> > > -device vfio-pci,host=1:00.0,bus=pcie.1,addr=0.0,multifunction=on \
> > > -device vfio-pci,host=1:00.1,bus=pcie.1,addr=0.1
> > >
> > > qemu-system-x86_64: -device vfio-pci,host=1:00.1,bus=pcie.1,addr=0.1:
> > > vfio \
> > > 0000:01:00.1: group 1 used in multiple address spaces
> > >
> > > However, when QEMU incorporates proper aliasing, we can make use of a
> > > PCIe-to-PCI bridge to mask the requester ID, resulting in a hack that
> > > provides the downstream devices with the same AddressSpace, ex:
> > >
> > > qemu-system-x86_64 -machine q35... \
> > > -device intel-iommu,intremap=on \
> > > -device pcie-pci-bridge,addr=1e.0,id=pci.1 \
> > > -device vfio-pci,host=1:00.0,bus=pci.1,addr=1.0,multifunction=on \
> > > -device vfio-pci,host=1:00.1,bus=pci.1,addr=1.1
> > >
> > > While the utility of this hack may be limited, this AddressSpace
> > > aliasing is the correct behavior for QEMU to emulate bare metal.
> > >
> > > Signed-off-by: Alex Williamson <address@hidden>
> >
> > The patch looks sane to me even as a bug fix since otherwise the DMA
> > address spaces used under misc kinds of PCI bridges can be wrong, so:
>
> I'm not sure if "as a bug fix" here is encouraging a 4.0 target, but
> I'd be cautious about this if so. Eric Auger noted that he's seen an
> SMMU VM hit a guest kernel bug-on, which needs further
> investigation. It's not clear if it's just an untested or
> unimplemented scenario for SMMU to see a conventional PCI bus or if
> there's something wrong in QEMU. I also haven't tested AMD IOMMU and
> only VT-d to a very limited degree, thus RFC.
Sorry to be unclear. I wasn't meant to target this for 4.0, and I
completely agree that it should be after the release since it is still
a relatively influential change to the PCI system of QEMU, not to
mention that the system mostly works well even without this patch.
(except things like assignment of multi-functions with IOMMU but it is
rare, after all)
>
> > Reviewed-by: Peter Xu <address@hidden>
> >
> > Though I have a question that confused me even before: Alex, do you
> > know why all the context entry of the devices in the IOMMU root table
> > will be programmed even if the devices are under a pcie-to-pci bridge?
> > I'm giving an example with above [1] to be clear: in that case IIUC
> > we'll program context entries for all the three devices (00:01.0,
> > 01:00.0, 01:00.1) but they'll point to the same IOMMU table. DMAs of
> > devices 01:00.0 and 01:00.1 should always been tagged with 01:00.0 on
> > bare metal and then why we bother to program the context entry of
> > 01:00.1? It seems never used.
> >
> > (It should be used for current QEMU to work with pcie-to-pci bridges
> > if without this patch, but I feel like I don't know the real answer
> > behind)
>
> We actually have two different scenarios that could be represented by
> [1], the group can be formed by lack of isolation or by lack of
> visibility. In the group above, it's the former, isolation. The PCIe
> root port does not support ACS, so while the IOMMU has visibility of
> the individual devices, peer-to-peer between devices may also be
> possible. Native, trusted, in-kernel drivers for these devices could
> still make use of separate IOMMU domains per device, but in order to
> expose the devices to a userspace driver we need to consider them a
> non-isolated set to prevent side-channel attacks between devices. We
> therefore consider them as a group within the IOMMU API and it's
> required that each context entry maps to the same domain as the IOMMU
> will see transactions for each requester ID.
>
> If we had the visibility case, such as if [1] represented a PCIe-to-PCI
> bridge subgroup, then the IOMMU really does only see the bridge
> requester ID and there may not be a reason to populate the context
> entries for the downstream aliased devices. Perhaps the IOMMU might
> still choose to do so, particularly if the bridge is actually a PCI-X
> bridge as PCI-X does incorporate a requester ID, but also has strange
> rules about the bridge being able to claim ownership of the
> transaction. So it might be paranoia or simplification that causes all
> the context entries to be programmed, or for alias quirks, uncertainty
> if a device exclusively uses a quirked requester ID or might sometimes
> use the proper requester ID.
>
> In the example I present, we're taking [1], which could be either case
> above, and converting it into the visibility case in order to force the
> IOMMU to handle the devices within a single address space. Thanks,
The answers are detailed and clear (as usual :). My thanks!
--
Peter Xu
Re: [Qemu-devel] [RFC PATCH] pci: Use PCI aliases when determining device IOMMU address space, Auger Eric, 2019/03/27
Re: [Qemu-devel] [RFC PATCH] pci: Use PCI aliases when determining device IOMMU address space, Michael S. Tsirkin, 2019/03/27