qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH] pci: Use PCI aliases when determining devic


From: Auger Eric
Subject: Re: [Qemu-devel] [RFC PATCH] pci: Use PCI aliases when determining device IOMMU address space
Date: Thu, 28 Mar 2019 13:53:27 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0

Hi Robin,

On 3/28/19 11:56 AM, Robin Murphy wrote:
> On 28/03/2019 10:38, Auger Eric wrote:
>> Hi Alex,
>>
>> [+ Robin]
>>
>> On 3/27/19 5:37 PM, Alex Williamson wrote:
>>> On Wed, 27 Mar 2019 14:25:00 +0800
>>> Peter Xu <address@hidden> wrote:
>>>
>>>> On Tue, Mar 26, 2019 at 04:55:19PM -0600, Alex Williamson wrote:
>>>>> Conventional PCI buses pre-date requester IDs.  An IOMMU cannot
>>>>> distinguish by devfn & bus between devices in a conventional PCI
>>>>> topology and therefore we cannot assign them separate AddressSpaces.
>>>>> By taking this requester ID aliasing into account, QEMU better matches
>>>>> the bare metal behavior and restrictions, and enables shared
>>>>> AddressSpace configurations that are otherwise not possible with
>>>>> guest IOMMU support.
>>>>>
>>>>> For the latter case, given any example where an IOMMU group on the
>>>>> host includes multiple devices:
>>>>>
>>>>>    $ ls  /sys/kernel/iommu_groups/1/devices/
>>>>>    0000:00:01.0  0000:01:00.0  0000:01:00.1
>>>>
>>>> [1]
>>>>
>>>>>
>>>>> If we incorporate a vIOMMU into the VM configuration, we're restricted
>>>>> that we can only assign one of the endpoints to the guest because a
>>>>> second endpoint will attempt to use a different AddressSpace.  VFIO
>>>>> only supports IOMMU group level granularity at the container level,
>>>>> preventing this second endpoint from being assigned:
>>>>>
>>>>> qemu-system-x86_64 -machine q35... \
>>>>>    -device intel-iommu,intremap=on \
>>>>>    -device pcie-root-port,addr=1e.0,id=pcie.1 \
>>>>>    -device vfio-pci,host=1:00.0,bus=pcie.1,addr=0.0,multifunction=on \
>>>>>    -device vfio-pci,host=1:00.1,bus=pcie.1,addr=0.1
>>>>>
>>>>> qemu-system-x86_64: -device
>>>>> vfio-pci,host=1:00.1,bus=pcie.1,addr=0.1: vfio \
>>>>> 0000:01:00.1: group 1 used in multiple address spaces
>>>>>
>>>>> However, when QEMU incorporates proper aliasing, we can make use of a
>>>>> PCIe-to-PCI bridge to mask the requester ID, resulting in a hack that
>>>>> provides the downstream devices with the same AddressSpace, ex:
>>>>>
>>>>> qemu-system-x86_64 -machine q35... \
>>>>>    -device intel-iommu,intremap=on \
>>>>>    -device pcie-pci-bridge,addr=1e.0,id=pci.1 \
>>>>>    -device vfio-pci,host=1:00.0,bus=pci.1,addr=1.0,multifunction=on \
>>>>>    -device vfio-pci,host=1:00.1,bus=pci.1,addr=1.1
>>>>>
>>>>> While the utility of this hack may be limited, this AddressSpace
>>>>> aliasing is the correct behavior for QEMU to emulate bare metal.
>>>>>
>>>>> Signed-off-by: Alex Williamson <address@hidden>
>>>>
>>>> The patch looks sane to me even as a bug fix since otherwise the DMA
>>>> address spaces used under misc kinds of PCI bridges can be wrong, so:
>>>
>>> I'm not sure if "as a bug fix" here is encouraging a 4.0 target, but
>>> I'd be cautious about this if so.  Eric Auger noted that he's seen an
>>> SMMU VM hit a guest kernel bug-on, which needs further
>>> investigation.  It's not clear if it's just an untested or
>>> unimplemented scenario for SMMU to see a conventional PCI bus or if
>>> there's something wrong in QEMU.  I also haven't tested AMD IOMMU and
>>> only VT-d to a very limited degree, thus RFC.
>>
>> So I have tracked this further and here is what I can see.
>>
>> On guest side, the 2 assigned devices that I have put downstream to the
>> pcie-to-pci bridge get an iommu_fwspec handle with 2 ids, the first one
>> corresponding to the requester id of the very device and the second one
>> corresponding to the rid matching the same bus number and devfn=0
>>
>> dev0 =  0000:02:01.0
>>          0000:02:00.0
>>
>> dev1 = 0000:02:01.1
>>         0000:02:00.0
>>
>> Then iommu_probe_device is called for 0000:02:01.0 and 0000:02:01.1.
>> Each time it iterates over the associated ids and we call add_device
>> twice for 0000:02:00.0. The second time, the arm-smmu-v3 driver
>> recognizes a context is already alive for 0000:02:00.0 and triggers a
>> BUG_ON().
> 
> Hmm, aliasing bridges are supposed to be handled as of commit
> 563b5cbe334e ("iommu/arm-smmu-v3: Cope with duplicated Stream IDs") -
> what's changed since then?

I our case, the problem is not we have the same ids[] for a given device
but we have 2 functions attached to the pcie-to-pci bridge and their
fwspec have an ids[] in common, the one with the subordinate bus and
devfn=0.

device pcie-pci-bridge,addr=1e.0,id=pci.1 \
-device vfio-pci,host=1:00.0,bus=pci.1,addr=1.0,multifunction=on \
-device vfio-pci,host=1:00.1,bus=pci.1,addr=1.1

[    2.509381] ixgbe 0000:02:01.0: arm_smmu_attach_dev ARM_SMMU_DOMAIN_S1
[    2.512095] arm_smmu_install_ste_for_dev sid[0]=520
[    2.513524] arm_smmu_write_strtab_ent sid=520 val=1
[    2.514872] arm_smmu_write_strtab_ent sid=520 ABORT
[    2.516266] arm_smmu_install_ste_for_dev sid[1]=512
[    2.517609] arm_smmu_write_strtab_ent sid=512 val=1
[    2.518958] arm_smmu_write_strtab_ent sid=512 ABORT

[    3.301316] ixgbe 0000:02:01.1: arm_smmu_attach_dev ARM_SMMU_DOMAIN_S1
[    3.302456] arm_smmu_install_ste_for_dev sid[0]=521
[    3.303420] arm_smmu_write_strtab_ent sid=521 val=1
[    3.304408] arm_smmu_write_strtab_ent sid=521 ABORT
[    3.305442] arm_smmu_install_ste_for_dev sid[1]=512
[    3.306812] arm_smmu_write_strtab_ent sid=512 val=3758686219
[    3.308390] arm_smmu_write_strtab_ent sid=512 S1 | S2 ste_live


Thanks

Eric
> 
> Robin.
> 
>> At the origin of the creation of 2 ids for each device,
>> iort_iommu_configure is called on each downstream device which calls
>> pci_for_each_dma_alias(). We enter the
>> pci_is_pcie(tmp)/PCI_EXP_TYPE_PCI_BRIDGE code path and
>> iort_pci_iommu_init is called with bus number 2 and devfn=0.
>>
>> Thanks
>>
>> Eric
>>>  
>>>> Reviewed-by: Peter Xu <address@hidden>
>>>>
>>>> Though I have a question that confused me even before: Alex, do you
>>>> know why all the context entry of the devices in the IOMMU root table
>>>> will be programmed even if the devices are under a pcie-to-pci bridge?
>>>> I'm giving an example with above [1] to be clear: in that case IIUC
>>>> we'll program context entries for all the three devices (00:01.0,
>>>> 01:00.0, 01:00.1) but they'll point to the same IOMMU table.  DMAs of
>>>> devices 01:00.0 and 01:00.1 should always been tagged with 01:00.0 on
>>>> bare metal and then why we bother to program the context entry of
>>>> 01:00.1?  It seems never used.
>>>>
>>>> (It should be used for current QEMU to work with pcie-to-pci bridges
>>>>   if without this patch, but I feel like I don't know the real answer
>>>>   behind)
>>>
>>> We actually have two different scenarios that could be represented by
>>> [1], the group can be formed by lack of isolation or by lack of
>>> visibility.  In the group above, it's the former, isolation.  The PCIe
>>> root port does not support ACS, so while the IOMMU has visibility of
>>> the individual devices, peer-to-peer between devices may also be
>>> possible.  Native, trusted, in-kernel drivers for these devices could
>>> still make use of separate IOMMU domains per device, but in order to
>>> expose the devices to a userspace driver we need to consider them a
>>> non-isolated set to prevent side-channel attacks between devices.  We
>>> therefore consider them as a group within the IOMMU API and it's
>>> required that each context entry maps to the same domain as the IOMMU
>>> will see transactions for each requester ID.
>>>
>>> If we had the visibility case, such as if [1] represented a PCIe-to-PCI
>>> bridge subgroup, then the IOMMU really does only see the bridge
>>> requester ID and there may not be a reason to populate the context
>>> entries for the downstream aliased devices.  Perhaps the IOMMU might
>>> still choose to do so, particularly if the bridge is actually a PCI-X
>>> bridge as PCI-X does incorporate a requester ID, but also has strange
>>> rules about the bridge being able to claim ownership of the
>>> transaction.  So it might be paranoia or simplification that causes all
>>> the context entries to be programmed, or for alias quirks, uncertainty
>>> if a device exclusively uses a quirked requester ID or might sometimes
>>> use the proper requester ID.
>>>
>>> In the example I present, we're taking [1], which could be either case
>>> above, and converting it into the visibility case in order to force the
>>> IOMMU to handle the devices within a single address space.  Thanks,
>>>
>>> Alex
>>>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]