qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [for-4.2 PATCH 2/2] hw/i386: AMD-Vi IVRS DMA alias supp


From: Peter Xu
Subject: Re: [Qemu-devel] [for-4.2 PATCH 2/2] hw/i386: AMD-Vi IVRS DMA alias support
Date: Mon, 29 Jul 2019 16:26:46 +0800
User-agent: Mutt/1.11.4 (2019-03-13)

On Fri, Jul 26, 2019 at 06:55:53PM -0600, Alex Williamson wrote:
> When we account for DMA aliases in the PCI address space, we can no
> longer use a single IVHD entry in the IVRS covering all devices.  We
> instead need to walk the PCI bus and create alias ranges when we find
> a conventional bus.  These alias ranges cannot overlap with a "Select
> All" range (as currently implemented), so we also need to enumerate
> each device with IVHD entries.
> 
> Importantly, the IVHD entries used here include a Device ID, which is
> simply the PCI BDF (Bus/Device/Function).  The guest firmware is
> responsible for programming bus numbers, so the final revision of this
> table depends on the update mechanism (acpi_build_update) to be called
> after guest PCI enumeration.

Ouch... so the ACPI build procedure is after those guest PCI code!
Could I ask how do you find this? :) It seems much easier for sure
this way...

This looks very nice to me already, though I still have got a few
questions, please see below.

[...]

> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_BRIDGE)) {
> +        PCIBus *sec_bus = pci_bridge_get_sec_bus(PCI_BRIDGE(dev));
> +        uint8_t sec = pci_bus_num(sec_bus);
> +        uint8_t sub = dev->config[PCI_SUBORDINATE_BUS];
> +
> +        if (pci_bus_is_express(sec_bus)) {
> +            /*
> +             * Walk the bus if there are subordinates, otherwise use a range
> +             * to cover an entire leaf bus.  We could potentially also use a
> +             * range for traversed buses, but we'd need to take care not to
> +             * create both Select and Range entries covering the same device.
> +             * This is easier and potentially more compact.
> +             *
> +             * An example bare metal system seems to use Select entries for
> +             * root ports without a slot (ie. built-ins) and Range entries
> +             * when there is a slot.  The same system also only hard-codes
> +             * the alias range for an onboard PCIe-to-PCI bridge, apparently
> +             * making no effort to support nested bridges.  We attempt to
> +             * be more thorough here.
> +             */
> +            if (sec == sub) { /* leaf bus */
> +                /* "Start of Range" IVHD entry, type 0x3 */
> +                entry = PCI_BUILD_BDF(sec, PCI_DEVFN(0, 0)) << 8 | 0x3;
> +                build_append_int_noprefix(table_data, entry, 4);
> +                /* "End of Range" IVHD entry, type 0x4 */
> +                entry = PCI_BUILD_BDF(sub, PCI_DEVFN(31, 7)) << 8 | 0x4;
> +                build_append_int_noprefix(table_data, entry, 4);
> +            } else {
> +                pci_for_each_device(sec_bus, sec, insert_ivhd, table_data);
> +            }
> +        } else {
> +            /*
> +             * If the secondary bus is conventional, then we need to create 
> an
> +             * Alias range for everything downstream.  The range covers the
> +             * first devfn on the secondary bus to the last devfn on the
> +             * subordinate bus.  The alias target depends on legacy versus
> +             * express bridges, just as in pci_device_iommu_address_space().
> +             * DeviceIDa vs DeviceIDb as per the AMD IOMMU spec.
> +             */
> +            uint16_t dev_id_a, dev_id_b;
> +
> +            dev_id_a = PCI_BUILD_BDF(sec, PCI_DEVFN(0, 0));
> +
> +            if (pci_is_express(dev) &&
> +                pcie_cap_get_type(dev) == PCI_EXP_TYPE_PCI_BRIDGE) {
> +                dev_id_b = dev_id_a;
> +            } else {
> +                dev_id_b = PCI_BUILD_BDF(pci_bus_num(bus), dev->devfn);
> +            }
> +
> +            /* "Alias Start of Range" IVHD entry, type 0x43, 8 bytes */
> +            build_append_int_noprefix(table_data, dev_id_a << 8 | 0x43, 4);
> +            build_append_int_noprefix(table_data, dev_id_b << 8 | 0x0, 4);
> +
> +            /* "End of Range" IVHD entry, type 0x4 */
> +            entry = PCI_BUILD_BDF(sub, PCI_DEVFN(31, 7)) << 8 | 0x4;
> +            build_append_int_noprefix(table_data, entry, 4);
> +        }

We've implmented the similar logic for multiple times:

  - When we want to do DMA (pci_requester_id)
  - When we want to fetch the DMA address space (the previous patch)
  - When we fill in the AMD ACPI table (this patch)

Do you think we can generalize them somehow?  I'm thinking how about
we directly fetch RID in the 2nd/3rd use case using pci_requester_id()
(which existed already) and simply use it?

[...]

> +    /*
> +     * A PCI bus walk, for each PCI host bridge, is necessary to create a
> +     * complete set of IVHD entries.  Do this into a separate blob so that we
> +     * can calculate the total IVRS table length here and then append the new
> +     * blob further below.  Fall back to an entry covering all devices, which
> +     * is sufficient when no aliases are present.
> +     */
> +    object_child_foreach_recursive(object_get_root(),
> +                                   ivrs_host_bridges, ivhd_blob);
> +
> +    if (!ivhd_blob->len) {
> +        /*
> +         *   Type 1 device entry reporting all devices
> +         *   These are 4-byte device entries currently reporting the range of
> +         *   Refer to Spec - Table 95:IVHD Device Entry Type Codes(4-byte)
> +         */
> +        build_append_int_noprefix(ivhd_blob, 0x0000001, 4);
> +    }

Is there a real use case for ivhd_blob->len==0?

Thanks,

-- 
Peter Xu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]