qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] [RFC 1/4] spapr_pci: Delegate placement of PCI host bridg


From: Alexey Kardashevskiy
Subject: Re: [Qemu-ppc] [RFC 1/4] spapr_pci: Delegate placement of PCI host bridges to machine type
Date: Fri, 7 Oct 2016 16:34:59 +1100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1

On 07/10/16 16:10, David Gibson wrote:
> On Fri, Oct 07, 2016 at 02:57:43PM +1100, Alexey Kardashevskiy wrote:
>> On 06/10/16 14:03, David Gibson wrote:
>>> The 'spapr-pci-host-bridge' represents the virtual PCI host bridge (PHB)
>>> for a PAPR guest.  Unlike on x86, it's routine on Power (both bare metal
>>> and PAPR guests) to have numerous independent PHBs, each controlling a
>>> separate PCI domain.
>>>
>>> There are two ways of configuring the spapr-pci-host-bridge device: first
>>> it can be done fully manually, specifying the locations and sizes of all
>>> the IO windows.  This gives the most control, but is very awkward with 6
>>> mandatory parameters.  Alternatively just an "index" can be specified
>>> which essentially selects from an array of predefined PHB locations.
>>> The PHB at index 0 is automatically created as the default PHB.
>>>
>>> The current set of default locations causes some problems for guests with
>>> large RAM (> 1 TiB) or PCI devices with very large BARs (e.g. big nVidia
>>> GPGPU cards via VFIO).  Obviously, for migration we can only change the
>>> locations on a new machine type, however.
>>>
>>> This is awkward, because the placement is currently decided within the
>>> spapr-pci-host-bridge code, so it breaks abstraction to look inside the
>>> machine type version.
>>>
>>> So, this patch delegates the "default mode" PHB placement from the
>>> spapr-pci-host-bridge device back to the machine type via a public method
>>> in sPAPRMachineClass.  It's still a bit ugly, but it's about the best we
>>> can do.
>>>
>>> For now, this just changes where the calculation is done.  It doesn't
>>> change the actual location of the host bridges, or any other behaviour.
>>>
>>> Signed-off-by: David Gibson <address@hidden>
>>> ---
>>>  hw/ppc/spapr.c              | 34 ++++++++++++++++++++++++++++++++++
>>>  hw/ppc/spapr_pci.c          | 22 ++++++++--------------
>>>  include/hw/pci-host/spapr.h | 11 +----------
>>>  include/hw/ppc/spapr.h      |  4 ++++
>>>  4 files changed, 47 insertions(+), 24 deletions(-)
>>>
>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>> index 03e3803..f6e9c2a 100644
>>> --- a/hw/ppc/spapr.c
>>> +++ b/hw/ppc/spapr.c
>>> @@ -2370,6 +2370,39 @@ static HotpluggableCPUList 
>>> *spapr_query_hotpluggable_cpus(MachineState *machine)
>>>      return head;
>>>  }
>>>  
>>> +static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
>>> +                                uint64_t *buid, hwaddr *pio, hwaddr 
>>> *pio_size,
>>> +                                hwaddr *mmio, hwaddr *mmio_size,
>>> +                                unsigned n_dma, uint32_t *liobns, Error 
>>> **errp)
>>> +{
>>> +    const uint64_t base_buid = 0x800000020000000ULL;
>>> +    const hwaddr phb0_base = 0x10000000000ULL; /* 1 TiB */
>>> +    const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */
>>> +    const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */
>>> +    const hwaddr pio_offset = 0x80000000; /* 2 GiB */
>>> +    const uint32_t max_index = 255;
>>> +
>>> +    hwaddr phb_base;
>>> +    int i;
>>> +
>>> +    if (index > max_index) {
>>> +        error_setg(errp, "\"index\" for PAPR PHB is too large (max %u)",
>>> +                   max_index);
>>> +        return;
>>> +    }
>>> +
>>> +    *buid = base_buid + index;
>>> +    for (i = 0; i < n_dma; ++i) {
>>> +        liobns[i] = SPAPR_PCI_LIOBN(index, i);
>>> +    }
>>> +
>>> +    phb_base = phb0_base + index * phb_spacing;
>>> +    *pio = phb_base + pio_offset;
>>> +    *pio_size = SPAPR_PCI_IO_WIN_SIZE;
>>> +    *mmio = phb_base + mmio_offset;
>>> +    *mmio_size = SPAPR_PCI_MMIO_WIN_SIZE;
>>> +}
>>> +
>>>  static void spapr_machine_class_init(ObjectClass *oc, void *data)
>>>  {
>>>      MachineClass *mc = MACHINE_CLASS(oc);
>>> @@ -2406,6 +2439,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
>>> void *data)
>>>      mc->query_hotpluggable_cpus = spapr_query_hotpluggable_cpus;
>>>      fwc->get_dev_path = spapr_get_fw_dev_path;
>>>      nc->nmi_monitor_handler = spapr_nmi;
>>> +    smc->phb_placement = spapr_phb_placement;
>>>  }
>>>  
>>>  static const TypeInfo spapr_machine_info = {
>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>>> index 4f00865..c0fc964 100644
>>> --- a/hw/ppc/spapr_pci.c
>>> +++ b/hw/ppc/spapr_pci.c
>>> @@ -1311,7 +1311,8 @@ static void spapr_phb_realize(DeviceState *dev, Error 
>>> **errp)
>>>          sphb->ddw_enabled ? SPAPR_PCI_DMA_MAX_WINDOWS : 1;
>>>  
>>>      if (sphb->index != (uint32_t)-1) {
>>> -        hwaddr windows_base;
>>> +        sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
>>> +        Error *local_err = NULL;
>>>  
>>>          if ((sphb->buid != (uint64_t)-1) || (sphb->dma_liobn[0] != 
>>> (uint32_t)-1)
>>>              || (sphb->dma_liobn[1] != (uint32_t)-1 && windows_supported == 
>>> 2)
>>> @@ -1322,21 +1323,14 @@ static void spapr_phb_realize(DeviceState *dev, 
>>> Error **errp)
>>>              return;
>>>          }
>>>  
>>> -        if (sphb->index > SPAPR_PCI_MAX_INDEX) {
>>> -            error_setg(errp, "\"index\" for PAPR PHB is too large (max 
>>> %u)",
>>> -                       SPAPR_PCI_MAX_INDEX);
>>> +        smc->phb_placement(spapr, sphb->index,
>>> +                           &sphb->buid, &sphb->io_win_addr, 
>>> &sphb->io_win_size,
>>> +                           &sphb->mem_win_addr, &sphb->mem_win_size,
>>> +                           windows_supported, sphb->dma_liobn, &local_err);
>>> +        if (local_err) {
>>> +            error_propagate(errp, local_err);
>>>              return;
>>>          }
>>> -
>>> -        sphb->buid = SPAPR_PCI_BASE_BUID + sphb->index;
>>> -        for (i = 0; i < windows_supported; ++i) {
>>> -            sphb->dma_liobn[i] = SPAPR_PCI_LIOBN(sphb->index, i);
>>> -        }
>>> -
>>> -        windows_base = SPAPR_PCI_WINDOW_BASE
>>> -            + sphb->index * SPAPR_PCI_WINDOW_SPACING;
>>> -        sphb->mem_win_addr = windows_base + SPAPR_PCI_MMIO_WIN_OFF;
>>> -        sphb->io_win_addr = windows_base + SPAPR_PCI_IO_WIN_OFF;
>>>      }
>>>  
>>>      if (sphb->buid == (uint64_t)-1) {
>>> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
>>> index 30dbd46..8c9ebfd 100644
>>> --- a/include/hw/pci-host/spapr.h
>>> +++ b/include/hw/pci-host/spapr.h
>>> @@ -79,18 +79,9 @@ struct sPAPRPHBState {
>>>      uint32_t numa_node;
>>>  };
>>>  
>>> -#define SPAPR_PCI_MAX_INDEX          255
>>> -
>>> -#define SPAPR_PCI_BASE_BUID          0x800000020000000ULL
>>> -
>>>  #define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x80000000ULL
>>>  
>>> -#define SPAPR_PCI_WINDOW_BASE        0x10000000000ULL
>>> -#define SPAPR_PCI_WINDOW_SPACING     0x1000000000ULL
>>> -#define SPAPR_PCI_MMIO_WIN_OFF       0xA0000000
>>> -#define SPAPR_PCI_MMIO_WIN_SIZE      (SPAPR_PCI_WINDOW_SPACING - \
>>> -                                     SPAPR_PCI_MEM_WIN_BUS_OFFSET)
>>> -#define SPAPR_PCI_IO_WIN_OFF         0x80000000
>>> +#define SPAPR_PCI_MMIO_WIN_SIZE      0xf80000000
>>>  #define SPAPR_PCI_IO_WIN_SIZE        0x10000
>>>  
>>>  #define SPAPR_PCI_MSI_WINDOW         0x40000000000ULL
>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>>> index 39dadaa..9e1c5a5 100644
>>> --- a/include/hw/ppc/spapr.h
>>> +++ b/include/hw/ppc/spapr.h
>>> @@ -40,6 +40,10 @@ struct sPAPRMachineClass {
>>>      bool dr_lmb_enabled;       /* enable dynamic-reconfig/hotplug of LMBs 
>>> */
>>>      bool use_ohci_by_default;  /* use USB-OHCI instead of XHCI */
>>>      const char *tcg_default_cpu; /* which (TCG) CPU to simulate by default 
>>> */
>>> +    void (*phb_placement)(sPAPRMachineState *spapr, uint32_t index,
>>> +                          uint64_t *buid, hwaddr *pio, hwaddr *pio_size,
>>> +                          hwaddr *mmio, hwaddr *mmio_size,
>>> +                          unsigned n_dma, uint32_t *liobns, Error **errp);
>>
>>
>> I still like idea of getting rid of index better. In order to reduce the
>> command line length, you could not enable IO and MMIO32 by default, the
>> default size for MMIO64 could be set to 1TB or something like this, BUID
>> could be just a raw MMIO64 address value.
> 
> Hm, interesting idea.  I do like the idea of making the BUID equal to
> the MMIO64 address.  Obviously we'd still need addresses for the
> default PHB, including 32-bit and PIO windows.
> 
> Trickier, we'd still need to support 'index' for compatibility with
> older command lines.  I guess we could reserve index purely for the
> "legacy" locations - 2.8+ machine types would create the default
> bridge with explicit windows, rather than just using index 0.
> 
>> So essentially you would only have to specify MMIO64 and 2xLIOBN in the
>> command line which does not seem that much of overkill assuming that a
>> second (third,...) PHB is almost never used and even if it is, it will be
>> used for SRIOV VF (which can do 64bit) or virtio (the same).
> 
> Hm, maybe.  3 values is still a bit of a PITA.  I guess liobn64 could
> theoretically be optional, although you're going to want it for
> basically every case where you're creating an additional PHB.
> 
> Or.. what if we insisted the mmio64 window had at least 4G alignment,
> then we might be able to use mmio64 >> 32 (== buid >> 32) as the
> liobn.  Hmm.. guess it would need to be 8G alignment, so we have one
> extra bit.


As you mentioned in 4/4, guests cannot access above 64 TiB, we could use
upper bits to store the window number.


> 
> ..and I wonder if this might be easier to manage if we introduced a
> new spapr-pci-host-bridge-2 subtype.
> 
> Let me think about this..
> 
>> With index and phb_placement(), QEMU will still auto-generate these IO/MMIO
>> addresses from the command line parameters which did hit us with migration
>> before as the QEMU command line on the source side and the destination side
>> is not exactly the same, and this might hit again...
> 
> Uh.. I'm not quite following you.  What's the situation which would
> break migration with the proposed scheme?

Well, I cannot think of any other device available on SPAPR which we could
map to the CPU address space but if it ever appears, we will have a problem
unless this new imaginary device does not pick an address from the CPU
space automatically.



-- 
Alexey

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]