[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH qemu] spapr_pci: Add numa node id
From: |
Alexey Kardashevskiy |
Subject: |
Re: [Qemu-devel] [PATCH qemu] spapr_pci: Add numa node id |
Date: |
Wed, 14 Sep 2016 19:39:10 +1000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 |
On 14/09/16 09:29, Michael Roth wrote:
> Quoting Alexey Kardashevskiy (2016-07-27 03:03:38)
>> This adds a numa id property to a PHB to allow linking passed PCI device
>> to CPU/memory. It is up to the management stack to do CPU/memory pinning
>> to the node with the actual PCI device.
>
> It looks like x86 relies on PCIBus->numa_node() method (via
> pci_bus_numa_node()) to encode similar PCIBus affinities
> into ACPI tables, and currently exposes it via
> -device pci-[-express]-expander-bus,numa_node=X.
Well, until we allow DMA windows per PCI bus (not per PHB as it is now),
this won't make much sense for us (unless I am missing something here).
> Maybe we should implement it using this existing
> PCIBus->numa_node() interface?
>
> We'd still have to expose numa_node as a spapr-pci-host-bridge
> device option though. Not sure if there's a more common way
> to expose it that might be easier for libvirt to discover. As it
> stands we'd need to add spapr-pci-host-bridge to a libvirt
> whitelist that currently only covers pci-expander-bus.
>
> Cc'ing Shiva who was looking into the libvirt side.
>
> One comment below:
>
>>
>> Signed-off-by: Alexey Kardashevskiy <address@hidden>
>> ---
>> hw/ppc/spapr_pci.c | 13 +++++++++++++
>> include/hw/pci-host/spapr.h | 2 ++
>> 2 files changed, 15 insertions(+)
>>
>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>> index 949c44f..af5394a 100644
>> --- a/hw/ppc/spapr_pci.c
>> +++ b/hw/ppc/spapr_pci.c
>> @@ -47,6 +47,7 @@
>> #include "sysemu/device_tree.h"
>> #include "sysemu/kvm.h"
>> #include "sysemu/hostmem.h"
>> +#include "sysemu/numa.h"
>>
>> #include "hw/vfio/vfio.h"
>>
>> @@ -1544,6 +1545,7 @@ static Property spapr_phb_properties[] = {
>> DEFINE_PROP_BOOL("ddw", sPAPRPHBState, ddw_enabled, true),
>> DEFINE_PROP_UINT64("pgsz", sPAPRPHBState, page_size_mask,
>> (1ULL << 12) | (1ULL << 16)),
>> + DEFINE_PROP_UINT32("node", sPAPRPHBState, numa_node, -1),
>> DEFINE_PROP_END_OF_LIST(),
>> };
>>
>> @@ -1805,6 +1807,11 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
>> cpu_to_be32(1),
>> cpu_to_be32(RTAS_IBM_RESET_PE_DMA_WINDOW)
>> };
>> + uint32_t associativity[] = {cpu_to_be32(0x4),
>> + cpu_to_be32(0x0),
>> + cpu_to_be32(0x0),
>> + cpu_to_be32(0x0),
>> + cpu_to_be32(phb->numa_node)};
>> sPAPRTCETable *tcet;
>> PCIBus *bus = PCI_HOST_BRIDGE(phb)->bus;
>> sPAPRFDT s_fdt;
>> @@ -1837,6 +1844,12 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
>> &ddw_extensions, sizeof(ddw_extensions)));
>> }
>>
>> + /* Advertise NUMA via ibm,associativity */
>> + if (nb_numa_nodes > 1) {
>> + _FDT(fdt_setprop(fdt, bus_off, "ibm,associativity", associativity,
>> + sizeof(associativity)));
>> + }
>
> LoPAPR 15.3 seems to suggest that ibm,associativity-reference-points is
> required alongside ibm,associativity for each DT node it appears in,
> and since we hardcode "Form 1" affinity it should be done similarly as
> the entry we place in the top-level DT node.
Hm, okay, I'll add it. There is a question to Bharata - why do we have 4s
in spapr_create_fdt_skel()'s refpoints? Just a random pick?
vec5[5]==0x80 means we are doing "Form1" and these 4s are far/near distances?
>> +
>> /* Build the interrupt-map, this must matches what is done
>> * in pci_spapr_map_irq
>> */
>> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
>> index 5adc603..53c4b2d 100644
>> --- a/include/hw/pci-host/spapr.h
>> +++ b/include/hw/pci-host/spapr.h
>> @@ -75,6 +75,8 @@ struct sPAPRPHBState {
>> bool ddw_enabled;
>> uint64_t page_size_mask;
>> uint64_t dma64_win_addr;
>> +
>> + uint32_t numa_node;
>> };
>>
>> #define SPAPR_PCI_MAX_INDEX 255
>> --
>> 2.5.0.rc3
>>
>
--
Alexey