[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [RFC 1/4] spapr_pci: Delegate placement of PCI host bridg
From: |
Alexey Kardashevskiy |
Subject: |
Re: [Qemu-ppc] [RFC 1/4] spapr_pci: Delegate placement of PCI host bridges to machine type |
Date: |
Fri, 7 Oct 2016 16:34:59 +1100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 |
On 07/10/16 16:10, David Gibson wrote:
> On Fri, Oct 07, 2016 at 02:57:43PM +1100, Alexey Kardashevskiy wrote:
>> On 06/10/16 14:03, David Gibson wrote:
>>> The 'spapr-pci-host-bridge' represents the virtual PCI host bridge (PHB)
>>> for a PAPR guest. Unlike on x86, it's routine on Power (both bare metal
>>> and PAPR guests) to have numerous independent PHBs, each controlling a
>>> separate PCI domain.
>>>
>>> There are two ways of configuring the spapr-pci-host-bridge device: first
>>> it can be done fully manually, specifying the locations and sizes of all
>>> the IO windows. This gives the most control, but is very awkward with 6
>>> mandatory parameters. Alternatively just an "index" can be specified
>>> which essentially selects from an array of predefined PHB locations.
>>> The PHB at index 0 is automatically created as the default PHB.
>>>
>>> The current set of default locations causes some problems for guests with
>>> large RAM (> 1 TiB) or PCI devices with very large BARs (e.g. big nVidia
>>> GPGPU cards via VFIO). Obviously, for migration we can only change the
>>> locations on a new machine type, however.
>>>
>>> This is awkward, because the placement is currently decided within the
>>> spapr-pci-host-bridge code, so it breaks abstraction to look inside the
>>> machine type version.
>>>
>>> So, this patch delegates the "default mode" PHB placement from the
>>> spapr-pci-host-bridge device back to the machine type via a public method
>>> in sPAPRMachineClass. It's still a bit ugly, but it's about the best we
>>> can do.
>>>
>>> For now, this just changes where the calculation is done. It doesn't
>>> change the actual location of the host bridges, or any other behaviour.
>>>
>>> Signed-off-by: David Gibson <address@hidden>
>>> ---
>>> hw/ppc/spapr.c | 34 ++++++++++++++++++++++++++++++++++
>>> hw/ppc/spapr_pci.c | 22 ++++++++--------------
>>> include/hw/pci-host/spapr.h | 11 +----------
>>> include/hw/ppc/spapr.h | 4 ++++
>>> 4 files changed, 47 insertions(+), 24 deletions(-)
>>>
>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>> index 03e3803..f6e9c2a 100644
>>> --- a/hw/ppc/spapr.c
>>> +++ b/hw/ppc/spapr.c
>>> @@ -2370,6 +2370,39 @@ static HotpluggableCPUList
>>> *spapr_query_hotpluggable_cpus(MachineState *machine)
>>> return head;
>>> }
>>>
>>> +static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
>>> + uint64_t *buid, hwaddr *pio, hwaddr
>>> *pio_size,
>>> + hwaddr *mmio, hwaddr *mmio_size,
>>> + unsigned n_dma, uint32_t *liobns, Error
>>> **errp)
>>> +{
>>> + const uint64_t base_buid = 0x800000020000000ULL;
>>> + const hwaddr phb0_base = 0x10000000000ULL; /* 1 TiB */
>>> + const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */
>>> + const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */
>>> + const hwaddr pio_offset = 0x80000000; /* 2 GiB */
>>> + const uint32_t max_index = 255;
>>> +
>>> + hwaddr phb_base;
>>> + int i;
>>> +
>>> + if (index > max_index) {
>>> + error_setg(errp, "\"index\" for PAPR PHB is too large (max %u)",
>>> + max_index);
>>> + return;
>>> + }
>>> +
>>> + *buid = base_buid + index;
>>> + for (i = 0; i < n_dma; ++i) {
>>> + liobns[i] = SPAPR_PCI_LIOBN(index, i);
>>> + }
>>> +
>>> + phb_base = phb0_base + index * phb_spacing;
>>> + *pio = phb_base + pio_offset;
>>> + *pio_size = SPAPR_PCI_IO_WIN_SIZE;
>>> + *mmio = phb_base + mmio_offset;
>>> + *mmio_size = SPAPR_PCI_MMIO_WIN_SIZE;
>>> +}
>>> +
>>> static void spapr_machine_class_init(ObjectClass *oc, void *data)
>>> {
>>> MachineClass *mc = MACHINE_CLASS(oc);
>>> @@ -2406,6 +2439,7 @@ static void spapr_machine_class_init(ObjectClass *oc,
>>> void *data)
>>> mc->query_hotpluggable_cpus = spapr_query_hotpluggable_cpus;
>>> fwc->get_dev_path = spapr_get_fw_dev_path;
>>> nc->nmi_monitor_handler = spapr_nmi;
>>> + smc->phb_placement = spapr_phb_placement;
>>> }
>>>
>>> static const TypeInfo spapr_machine_info = {
>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>>> index 4f00865..c0fc964 100644
>>> --- a/hw/ppc/spapr_pci.c
>>> +++ b/hw/ppc/spapr_pci.c
>>> @@ -1311,7 +1311,8 @@ static void spapr_phb_realize(DeviceState *dev, Error
>>> **errp)
>>> sphb->ddw_enabled ? SPAPR_PCI_DMA_MAX_WINDOWS : 1;
>>>
>>> if (sphb->index != (uint32_t)-1) {
>>> - hwaddr windows_base;
>>> + sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
>>> + Error *local_err = NULL;
>>>
>>> if ((sphb->buid != (uint64_t)-1) || (sphb->dma_liobn[0] !=
>>> (uint32_t)-1)
>>> || (sphb->dma_liobn[1] != (uint32_t)-1 && windows_supported ==
>>> 2)
>>> @@ -1322,21 +1323,14 @@ static void spapr_phb_realize(DeviceState *dev,
>>> Error **errp)
>>> return;
>>> }
>>>
>>> - if (sphb->index > SPAPR_PCI_MAX_INDEX) {
>>> - error_setg(errp, "\"index\" for PAPR PHB is too large (max
>>> %u)",
>>> - SPAPR_PCI_MAX_INDEX);
>>> + smc->phb_placement(spapr, sphb->index,
>>> + &sphb->buid, &sphb->io_win_addr,
>>> &sphb->io_win_size,
>>> + &sphb->mem_win_addr, &sphb->mem_win_size,
>>> + windows_supported, sphb->dma_liobn, &local_err);
>>> + if (local_err) {
>>> + error_propagate(errp, local_err);
>>> return;
>>> }
>>> -
>>> - sphb->buid = SPAPR_PCI_BASE_BUID + sphb->index;
>>> - for (i = 0; i < windows_supported; ++i) {
>>> - sphb->dma_liobn[i] = SPAPR_PCI_LIOBN(sphb->index, i);
>>> - }
>>> -
>>> - windows_base = SPAPR_PCI_WINDOW_BASE
>>> - + sphb->index * SPAPR_PCI_WINDOW_SPACING;
>>> - sphb->mem_win_addr = windows_base + SPAPR_PCI_MMIO_WIN_OFF;
>>> - sphb->io_win_addr = windows_base + SPAPR_PCI_IO_WIN_OFF;
>>> }
>>>
>>> if (sphb->buid == (uint64_t)-1) {
>>> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
>>> index 30dbd46..8c9ebfd 100644
>>> --- a/include/hw/pci-host/spapr.h
>>> +++ b/include/hw/pci-host/spapr.h
>>> @@ -79,18 +79,9 @@ struct sPAPRPHBState {
>>> uint32_t numa_node;
>>> };
>>>
>>> -#define SPAPR_PCI_MAX_INDEX 255
>>> -
>>> -#define SPAPR_PCI_BASE_BUID 0x800000020000000ULL
>>> -
>>> #define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x80000000ULL
>>>
>>> -#define SPAPR_PCI_WINDOW_BASE 0x10000000000ULL
>>> -#define SPAPR_PCI_WINDOW_SPACING 0x1000000000ULL
>>> -#define SPAPR_PCI_MMIO_WIN_OFF 0xA0000000
>>> -#define SPAPR_PCI_MMIO_WIN_SIZE (SPAPR_PCI_WINDOW_SPACING - \
>>> - SPAPR_PCI_MEM_WIN_BUS_OFFSET)
>>> -#define SPAPR_PCI_IO_WIN_OFF 0x80000000
>>> +#define SPAPR_PCI_MMIO_WIN_SIZE 0xf80000000
>>> #define SPAPR_PCI_IO_WIN_SIZE 0x10000
>>>
>>> #define SPAPR_PCI_MSI_WINDOW 0x40000000000ULL
>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>>> index 39dadaa..9e1c5a5 100644
>>> --- a/include/hw/ppc/spapr.h
>>> +++ b/include/hw/ppc/spapr.h
>>> @@ -40,6 +40,10 @@ struct sPAPRMachineClass {
>>> bool dr_lmb_enabled; /* enable dynamic-reconfig/hotplug of LMBs
>>> */
>>> bool use_ohci_by_default; /* use USB-OHCI instead of XHCI */
>>> const char *tcg_default_cpu; /* which (TCG) CPU to simulate by default
>>> */
>>> + void (*phb_placement)(sPAPRMachineState *spapr, uint32_t index,
>>> + uint64_t *buid, hwaddr *pio, hwaddr *pio_size,
>>> + hwaddr *mmio, hwaddr *mmio_size,
>>> + unsigned n_dma, uint32_t *liobns, Error **errp);
>>
>>
>> I still like idea of getting rid of index better. In order to reduce the
>> command line length, you could not enable IO and MMIO32 by default, the
>> default size for MMIO64 could be set to 1TB or something like this, BUID
>> could be just a raw MMIO64 address value.
>
> Hm, interesting idea. I do like the idea of making the BUID equal to
> the MMIO64 address. Obviously we'd still need addresses for the
> default PHB, including 32-bit and PIO windows.
>
> Trickier, we'd still need to support 'index' for compatibility with
> older command lines. I guess we could reserve index purely for the
> "legacy" locations - 2.8+ machine types would create the default
> bridge with explicit windows, rather than just using index 0.
>
>> So essentially you would only have to specify MMIO64 and 2xLIOBN in the
>> command line which does not seem that much of overkill assuming that a
>> second (third,...) PHB is almost never used and even if it is, it will be
>> used for SRIOV VF (which can do 64bit) or virtio (the same).
>
> Hm, maybe. 3 values is still a bit of a PITA. I guess liobn64 could
> theoretically be optional, although you're going to want it for
> basically every case where you're creating an additional PHB.
>
> Or.. what if we insisted the mmio64 window had at least 4G alignment,
> then we might be able to use mmio64 >> 32 (== buid >> 32) as the
> liobn. Hmm.. guess it would need to be 8G alignment, so we have one
> extra bit.
As you mentioned in 4/4, guests cannot access above 64 TiB, we could use
upper bits to store the window number.
>
> ..and I wonder if this might be easier to manage if we introduced a
> new spapr-pci-host-bridge-2 subtype.
>
> Let me think about this..
>
>> With index and phb_placement(), QEMU will still auto-generate these IO/MMIO
>> addresses from the command line parameters which did hit us with migration
>> before as the QEMU command line on the source side and the destination side
>> is not exactly the same, and this might hit again...
>
> Uh.. I'm not quite following you. What's the situation which would
> break migration with the proposed scheme?
Well, I cannot think of any other device available on SPAPR which we could
map to the CPU address space but if it ever appears, we will have a problem
unless this new imaginary device does not pick an address from the CPU
space automatically.
--
Alexey
signature.asc
Description: OpenPGP digital signature
[Qemu-ppc] [RFC 3/4] spapr_pci: Add a 64-bit MMIO window, David Gibson, 2016/10/05
[Qemu-ppc] [RFC 4/4] spapr: Improved placement of PCI host bridges in guest memory map, David Gibson, 2016/10/05
[Qemu-ppc] [RFC 2/4] spapr: Adjust placement of PCI host bridge to allow > 1TiB RAM, David Gibson, 2016/10/05