[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [for-4.2 PATCH 2/2] hw/i386: AMD-Vi IVRS DMA alias supp
From: |
Peter Xu |
Subject: |
Re: [Qemu-devel] [for-4.2 PATCH 2/2] hw/i386: AMD-Vi IVRS DMA alias support |
Date: |
Mon, 29 Jul 2019 16:26:46 +0800 |
User-agent: |
Mutt/1.11.4 (2019-03-13) |
On Fri, Jul 26, 2019 at 06:55:53PM -0600, Alex Williamson wrote:
> When we account for DMA aliases in the PCI address space, we can no
> longer use a single IVHD entry in the IVRS covering all devices. We
> instead need to walk the PCI bus and create alias ranges when we find
> a conventional bus. These alias ranges cannot overlap with a "Select
> All" range (as currently implemented), so we also need to enumerate
> each device with IVHD entries.
>
> Importantly, the IVHD entries used here include a Device ID, which is
> simply the PCI BDF (Bus/Device/Function). The guest firmware is
> responsible for programming bus numbers, so the final revision of this
> table depends on the update mechanism (acpi_build_update) to be called
> after guest PCI enumeration.
Ouch... so the ACPI build procedure is after those guest PCI code!
Could I ask how do you find this? :) It seems much easier for sure
this way...
This looks very nice to me already, though I still have got a few
questions, please see below.
[...]
> + if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_BRIDGE)) {
> + PCIBus *sec_bus = pci_bridge_get_sec_bus(PCI_BRIDGE(dev));
> + uint8_t sec = pci_bus_num(sec_bus);
> + uint8_t sub = dev->config[PCI_SUBORDINATE_BUS];
> +
> + if (pci_bus_is_express(sec_bus)) {
> + /*
> + * Walk the bus if there are subordinates, otherwise use a range
> + * to cover an entire leaf bus. We could potentially also use a
> + * range for traversed buses, but we'd need to take care not to
> + * create both Select and Range entries covering the same device.
> + * This is easier and potentially more compact.
> + *
> + * An example bare metal system seems to use Select entries for
> + * root ports without a slot (ie. built-ins) and Range entries
> + * when there is a slot. The same system also only hard-codes
> + * the alias range for an onboard PCIe-to-PCI bridge, apparently
> + * making no effort to support nested bridges. We attempt to
> + * be more thorough here.
> + */
> + if (sec == sub) { /* leaf bus */
> + /* "Start of Range" IVHD entry, type 0x3 */
> + entry = PCI_BUILD_BDF(sec, PCI_DEVFN(0, 0)) << 8 | 0x3;
> + build_append_int_noprefix(table_data, entry, 4);
> + /* "End of Range" IVHD entry, type 0x4 */
> + entry = PCI_BUILD_BDF(sub, PCI_DEVFN(31, 7)) << 8 | 0x4;
> + build_append_int_noprefix(table_data, entry, 4);
> + } else {
> + pci_for_each_device(sec_bus, sec, insert_ivhd, table_data);
> + }
> + } else {
> + /*
> + * If the secondary bus is conventional, then we need to create
> an
> + * Alias range for everything downstream. The range covers the
> + * first devfn on the secondary bus to the last devfn on the
> + * subordinate bus. The alias target depends on legacy versus
> + * express bridges, just as in pci_device_iommu_address_space().
> + * DeviceIDa vs DeviceIDb as per the AMD IOMMU spec.
> + */
> + uint16_t dev_id_a, dev_id_b;
> +
> + dev_id_a = PCI_BUILD_BDF(sec, PCI_DEVFN(0, 0));
> +
> + if (pci_is_express(dev) &&
> + pcie_cap_get_type(dev) == PCI_EXP_TYPE_PCI_BRIDGE) {
> + dev_id_b = dev_id_a;
> + } else {
> + dev_id_b = PCI_BUILD_BDF(pci_bus_num(bus), dev->devfn);
> + }
> +
> + /* "Alias Start of Range" IVHD entry, type 0x43, 8 bytes */
> + build_append_int_noprefix(table_data, dev_id_a << 8 | 0x43, 4);
> + build_append_int_noprefix(table_data, dev_id_b << 8 | 0x0, 4);
> +
> + /* "End of Range" IVHD entry, type 0x4 */
> + entry = PCI_BUILD_BDF(sub, PCI_DEVFN(31, 7)) << 8 | 0x4;
> + build_append_int_noprefix(table_data, entry, 4);
> + }
We've implmented the similar logic for multiple times:
- When we want to do DMA (pci_requester_id)
- When we want to fetch the DMA address space (the previous patch)
- When we fill in the AMD ACPI table (this patch)
Do you think we can generalize them somehow? I'm thinking how about
we directly fetch RID in the 2nd/3rd use case using pci_requester_id()
(which existed already) and simply use it?
[...]
> + /*
> + * A PCI bus walk, for each PCI host bridge, is necessary to create a
> + * complete set of IVHD entries. Do this into a separate blob so that we
> + * can calculate the total IVRS table length here and then append the new
> + * blob further below. Fall back to an entry covering all devices, which
> + * is sufficient when no aliases are present.
> + */
> + object_child_foreach_recursive(object_get_root(),
> + ivrs_host_bridges, ivhd_blob);
> +
> + if (!ivhd_blob->len) {
> + /*
> + * Type 1 device entry reporting all devices
> + * These are 4-byte device entries currently reporting the range of
> + * Refer to Spec - Table 95:IVHD Device Entry Type Codes(4-byte)
> + */
> + build_append_int_noprefix(ivhd_blob, 0x0000001, 4);
> + }
Is there a real use case for ivhd_blob->len==0?
Thanks,
--
Peter Xu