|
From: | Robin Murphy |
Subject: | Re: [Qemu-devel] [RFC PATCH] pci: Use PCI aliases when determining device IOMMU address space |
Date: | Thu, 28 Mar 2019 10:56:16 +0000 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.1 |
On 28/03/2019 10:38, Auger Eric wrote:
Hi Alex, [+ Robin] On 3/27/19 5:37 PM, Alex Williamson wrote:On Wed, 27 Mar 2019 14:25:00 +0800 Peter Xu <address@hidden> wrote:On Tue, Mar 26, 2019 at 04:55:19PM -0600, Alex Williamson wrote:Conventional PCI buses pre-date requester IDs. An IOMMU cannot distinguish by devfn & bus between devices in a conventional PCI topology and therefore we cannot assign them separate AddressSpaces. By taking this requester ID aliasing into account, QEMU better matches the bare metal behavior and restrictions, and enables shared AddressSpace configurations that are otherwise not possible with guest IOMMU support. For the latter case, given any example where an IOMMU group on the host includes multiple devices: $ ls /sys/kernel/iommu_groups/1/devices/ 0000:00:01.0 0000:01:00.0 0000:01:00.1[1]If we incorporate a vIOMMU into the VM configuration, we're restricted that we can only assign one of the endpoints to the guest because a second endpoint will attempt to use a different AddressSpace. VFIO only supports IOMMU group level granularity at the container level, preventing this second endpoint from being assigned: qemu-system-x86_64 -machine q35... \ -device intel-iommu,intremap=on \ -device pcie-root-port,addr=1e.0,id=pcie.1 \ -device vfio-pci,host=1:00.0,bus=pcie.1,addr=0.0,multifunction=on \ -device vfio-pci,host=1:00.1,bus=pcie.1,addr=0.1 qemu-system-x86_64: -device vfio-pci,host=1:00.1,bus=pcie.1,addr=0.1: vfio \ 0000:01:00.1: group 1 used in multiple address spaces However, when QEMU incorporates proper aliasing, we can make use of a PCIe-to-PCI bridge to mask the requester ID, resulting in a hack that provides the downstream devices with the same AddressSpace, ex: qemu-system-x86_64 -machine q35... \ -device intel-iommu,intremap=on \ -device pcie-pci-bridge,addr=1e.0,id=pci.1 \ -device vfio-pci,host=1:00.0,bus=pci.1,addr=1.0,multifunction=on \ -device vfio-pci,host=1:00.1,bus=pci.1,addr=1.1 While the utility of this hack may be limited, this AddressSpace aliasing is the correct behavior for QEMU to emulate bare metal. Signed-off-by: Alex Williamson <address@hidden>The patch looks sane to me even as a bug fix since otherwise the DMA address spaces used under misc kinds of PCI bridges can be wrong, so:I'm not sure if "as a bug fix" here is encouraging a 4.0 target, but I'd be cautious about this if so. Eric Auger noted that he's seen an SMMU VM hit a guest kernel bug-on, which needs further investigation. It's not clear if it's just an untested or unimplemented scenario for SMMU to see a conventional PCI bus or if there's something wrong in QEMU. I also haven't tested AMD IOMMU and only VT-d to a very limited degree, thus RFC.So I have tracked this further and here is what I can see. On guest side, the 2 assigned devices that I have put downstream to the pcie-to-pci bridge get an iommu_fwspec handle with 2 ids, the first one corresponding to the requester id of the very device and the second one corresponding to the rid matching the same bus number and devfn=0 dev0 = 0000:02:01.0 0000:02:00.0 dev1 = 0000:02:01.1 0000:02:00.0 Then iommu_probe_device is called for 0000:02:01.0 and 0000:02:01.1. Each time it iterates over the associated ids and we call add_device twice for 0000:02:00.0. The second time, the arm-smmu-v3 driver recognizes a context is already alive for 0000:02:00.0 and triggers a BUG_ON().
Hmm, aliasing bridges are supposed to be handled as of commit 563b5cbe334e ("iommu/arm-smmu-v3: Cope with duplicated Stream IDs") - what's changed since then?
Robin.
At the origin of the creation of 2 ids for each device, iort_iommu_configure is called on each downstream device which calls pci_for_each_dma_alias(). We enter the pci_is_pcie(tmp)/PCI_EXP_TYPE_PCI_BRIDGE code path and iort_pci_iommu_init is called with bus number 2 and devfn=0. Thanks EricReviewed-by: Peter Xu <address@hidden> Though I have a question that confused me even before: Alex, do you know why all the context entry of the devices in the IOMMU root table will be programmed even if the devices are under a pcie-to-pci bridge? I'm giving an example with above [1] to be clear: in that case IIUC we'll program context entries for all the three devices (00:01.0, 01:00.0, 01:00.1) but they'll point to the same IOMMU table. DMAs of devices 01:00.0 and 01:00.1 should always been tagged with 01:00.0 on bare metal and then why we bother to program the context entry of 01:00.1? It seems never used. (It should be used for current QEMU to work with pcie-to-pci bridges if without this patch, but I feel like I don't know the real answer behind)We actually have two different scenarios that could be represented by [1], the group can be formed by lack of isolation or by lack of visibility. In the group above, it's the former, isolation. The PCIe root port does not support ACS, so while the IOMMU has visibility of the individual devices, peer-to-peer between devices may also be possible. Native, trusted, in-kernel drivers for these devices could still make use of separate IOMMU domains per device, but in order to expose the devices to a userspace driver we need to consider them a non-isolated set to prevent side-channel attacks between devices. We therefore consider them as a group within the IOMMU API and it's required that each context entry maps to the same domain as the IOMMU will see transactions for each requester ID. If we had the visibility case, such as if [1] represented a PCIe-to-PCI bridge subgroup, then the IOMMU really does only see the bridge requester ID and there may not be a reason to populate the context entries for the downstream aliased devices. Perhaps the IOMMU might still choose to do so, particularly if the bridge is actually a PCI-X bridge as PCI-X does incorporate a requester ID, but also has strange rules about the bridge being able to claim ownership of the transaction. So it might be paranoia or simplification that causes all the context entries to be programmed, or for alias quirks, uncertainty if a device exclusively uses a quirked requester ID or might sometimes use the proper requester ID. In the example I present, we're taking [1], which could be either case above, and converting it into the visibility case in order to force the IOMMU to handle the devices within a single address space. Thanks, Alex
[Prev in Thread] | Current Thread | [Next in Thread] |