qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH] pci: Use PCI aliases when determining devic


From: Robin Murphy
Subject: Re: [Qemu-devel] [RFC PATCH] pci: Use PCI aliases when determining device IOMMU address space
Date: Thu, 28 Mar 2019 10:56:16 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.1

On 28/03/2019 10:38, Auger Eric wrote:
Hi Alex,

[+ Robin]

On 3/27/19 5:37 PM, Alex Williamson wrote:
On Wed, 27 Mar 2019 14:25:00 +0800
Peter Xu <address@hidden> wrote:

On Tue, Mar 26, 2019 at 04:55:19PM -0600, Alex Williamson wrote:
Conventional PCI buses pre-date requester IDs.  An IOMMU cannot
distinguish by devfn & bus between devices in a conventional PCI
topology and therefore we cannot assign them separate AddressSpaces.
By taking this requester ID aliasing into account, QEMU better matches
the bare metal behavior and restrictions, and enables shared
AddressSpace configurations that are otherwise not possible with
guest IOMMU support.

For the latter case, given any example where an IOMMU group on the
host includes multiple devices:

   $ ls  /sys/kernel/iommu_groups/1/devices/
   0000:00:01.0  0000:01:00.0  0000:01:00.1

[1]


If we incorporate a vIOMMU into the VM configuration, we're restricted
that we can only assign one of the endpoints to the guest because a
second endpoint will attempt to use a different AddressSpace.  VFIO
only supports IOMMU group level granularity at the container level,
preventing this second endpoint from being assigned:

qemu-system-x86_64 -machine q35... \
   -device intel-iommu,intremap=on \
   -device pcie-root-port,addr=1e.0,id=pcie.1 \
   -device vfio-pci,host=1:00.0,bus=pcie.1,addr=0.0,multifunction=on \
   -device vfio-pci,host=1:00.1,bus=pcie.1,addr=0.1

qemu-system-x86_64: -device vfio-pci,host=1:00.1,bus=pcie.1,addr=0.1: vfio \
0000:01:00.1: group 1 used in multiple address spaces

However, when QEMU incorporates proper aliasing, we can make use of a
PCIe-to-PCI bridge to mask the requester ID, resulting in a hack that
provides the downstream devices with the same AddressSpace, ex:

qemu-system-x86_64 -machine q35... \
   -device intel-iommu,intremap=on \
   -device pcie-pci-bridge,addr=1e.0,id=pci.1 \
   -device vfio-pci,host=1:00.0,bus=pci.1,addr=1.0,multifunction=on \
   -device vfio-pci,host=1:00.1,bus=pci.1,addr=1.1

While the utility of this hack may be limited, this AddressSpace
aliasing is the correct behavior for QEMU to emulate bare metal.

Signed-off-by: Alex Williamson <address@hidden>

The patch looks sane to me even as a bug fix since otherwise the DMA
address spaces used under misc kinds of PCI bridges can be wrong, so:

I'm not sure if "as a bug fix" here is encouraging a 4.0 target, but
I'd be cautious about this if so.  Eric Auger noted that he's seen an
SMMU VM hit a guest kernel bug-on, which needs further
investigation.  It's not clear if it's just an untested or
unimplemented scenario for SMMU to see a conventional PCI bus or if
there's something wrong in QEMU.  I also haven't tested AMD IOMMU and
only VT-d to a very limited degree, thus RFC.

So I have tracked this further and here is what I can see.

On guest side, the 2 assigned devices that I have put downstream to the
pcie-to-pci bridge get an iommu_fwspec handle with 2 ids, the first one
corresponding to the requester id of the very device and the second one
corresponding to the rid matching the same bus number and devfn=0

dev0 =  0000:02:01.0
         0000:02:00.0

dev1 = 0000:02:01.1
        0000:02:00.0

Then iommu_probe_device is called for 0000:02:01.0 and 0000:02:01.1.
Each time it iterates over the associated ids and we call add_device
twice for 0000:02:00.0. The second time, the arm-smmu-v3 driver
recognizes a context is already alive for 0000:02:00.0 and triggers a
BUG_ON().

Hmm, aliasing bridges are supposed to be handled as of commit 563b5cbe334e ("iommu/arm-smmu-v3: Cope with duplicated Stream IDs") - what's changed since then?

Robin.

At the origin of the creation of 2 ids for each device,
iort_iommu_configure is called on each downstream device which calls
pci_for_each_dma_alias(). We enter the
pci_is_pcie(tmp)/PCI_EXP_TYPE_PCI_BRIDGE code path and
iort_pci_iommu_init is called with bus number 2 and devfn=0.

Thanks

Eric
Reviewed-by: Peter Xu <address@hidden>

Though I have a question that confused me even before: Alex, do you
know why all the context entry of the devices in the IOMMU root table
will be programmed even if the devices are under a pcie-to-pci bridge?
I'm giving an example with above [1] to be clear: in that case IIUC
we'll program context entries for all the three devices (00:01.0,
01:00.0, 01:00.1) but they'll point to the same IOMMU table.  DMAs of
devices 01:00.0 and 01:00.1 should always been tagged with 01:00.0 on
bare metal and then why we bother to program the context entry of
01:00.1?  It seems never used.

(It should be used for current QEMU to work with pcie-to-pci bridges
  if without this patch, but I feel like I don't know the real answer
  behind)

We actually have two different scenarios that could be represented by
[1], the group can be formed by lack of isolation or by lack of
visibility.  In the group above, it's the former, isolation.  The PCIe
root port does not support ACS, so while the IOMMU has visibility of
the individual devices, peer-to-peer between devices may also be
possible.  Native, trusted, in-kernel drivers for these devices could
still make use of separate IOMMU domains per device, but in order to
expose the devices to a userspace driver we need to consider them a
non-isolated set to prevent side-channel attacks between devices.  We
therefore consider them as a group within the IOMMU API and it's
required that each context entry maps to the same domain as the IOMMU
will see transactions for each requester ID.

If we had the visibility case, such as if [1] represented a PCIe-to-PCI
bridge subgroup, then the IOMMU really does only see the bridge
requester ID and there may not be a reason to populate the context
entries for the downstream aliased devices.  Perhaps the IOMMU might
still choose to do so, particularly if the bridge is actually a PCI-X
bridge as PCI-X does incorporate a requester ID, but also has strange
rules about the bridge being able to claim ownership of the
transaction.  So it might be paranoia or simplification that causes all
the context entries to be programmed, or for alias quirks, uncertainty
if a device exclusively uses a quirked requester ID or might sometimes
use the proper requester ID.

In the example I present, we're taking [1], which could be either case
above, and converting it into the visibility case in order to force the
IOMMU to handle the devices within a single address space.  Thanks,

Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]