qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] [PATCH v3 06/35] spapr/xive: introduce a XIVE interrupt p


From: David Gibson
Subject: Re: [Qemu-ppc] [PATCH v3 06/35] spapr/xive: introduce a XIVE interrupt presenter model
Date: Thu, 3 May 2018 15:35:32 +1000
User-agent: Mutt/1.9.3 (2018-01-21)

On Thu, Apr 26, 2018 at 11:27:21AM +0200, Cédric Le Goater wrote:
> On 04/26/2018 09:11 AM, David Gibson wrote:
> > On Thu, Apr 19, 2018 at 02:43:02PM +0200, Cédric Le Goater wrote:
> >> The XIVE presenter engine uses a set of registers to handle priority
> >> management and interrupt acknowledgment among other things. The most
> >> important ones being :
> >>
> >>   - Interrupt Priority Register (PIPR)
> >>   - Interrupt Pending Buffer (IPB)
> >>   - Current Processor Priority (CPPR)
> >>   - Notification Source Register (NSR)
> >>
> >> There is one set of registers per level of privilege, four in all :
> >> HW, HV pool, OS and User. These are called rings. All registers are
> >> accessible through a specific MMIO region called the Thread Interrupt
> >> Management Areas (TIMA) but, depending on the privilege level of the
> >> CPU, the view of the TIMA is filtered. The sPAPR machine runs at the
> >> OS privilege and therefore can only accesses the OS and the User
> >> rings. The others are for hypervisor levels.
> >>
> >> The CPU interrupt state is modeled with a XiveNVT object which stores
> >> the values of the different registers. The different TIMA views are
> >> mapped at the same address for each CPU and 'current_cpu' is used to
> >> retrieve the XiveNVT holding the ring registers.
> >>
> >> Signed-off-by: Cédric Le Goater <address@hidden>
> >> ---
> >>
> >>  Changes since v2 :
> >>
> >>  - introduced the XiveFabric interface
> >>
> >>  hw/intc/spapr_xive.c        |  25 ++++
> >>  hw/intc/xive.c              | 279 
> >> ++++++++++++++++++++++++++++++++++++++++++++
> >>  include/hw/ppc/spapr_xive.h |   5 +
> >>  include/hw/ppc/xive.h       |  31 +++++
> >>  include/hw/ppc/xive_regs.h  |  84 +++++++++++++
> >>  5 files changed, 424 insertions(+)
> >>
> >> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> >> index 90cde8a4082d..f07832bf0a00 100644
> >> --- a/hw/intc/spapr_xive.c
> >> +++ b/hw/intc/spapr_xive.c
> >> @@ -13,6 +13,7 @@
> >>  #include "target/ppc/cpu.h"
> >>  #include "sysemu/cpus.h"
> >>  #include "monitor/monitor.h"
> >> +#include "hw/ppc/spapr.h"
> >>  #include "hw/ppc/spapr_xive.h"
> >>  #include "hw/ppc/xive.h"
> >>  #include "hw/ppc/xive_regs.h"
> >> @@ -95,6 +96,22 @@ static void spapr_xive_realize(DeviceState *dev, Error 
> >> **errp)
> >>  
> >>      /* Allocate the Interrupt Virtualization Table */
> >>      xive->ivt = g_new0(XiveIVE, xive->nr_irqs);
> >> +
> >> +    /* The Thread Interrupt Management Area has the same address for
> >> +     * each chip. On sPAPR, we only need to expose the User and OS
> >> +     * level views of the TIMA.
> >> +     */
> >> +    xive->tm_base = XIVE_TM_BASE;
> > 
> > The constant should probably have PAPR in the name somewhere, since
> > it's just for PAPR machines (same for the ESB mappings, actually).
> 
> ok. 
> 
> I have also made 'tm_base' a property, like 'vc_base' for ESBs, in 
> case we want to change the value when the guest is instantiated. 
> I doubt it but this is an address in the global address space, so 
> letting the machine have control is better I think.

I agree.

> >> +
> >> +    memory_region_init_io(&xive->tm_mmio_user, OBJECT(xive),
> >> +                          &xive_tm_user_ops, xive, "xive.tima.user",
> >> +                          1ull << TM_SHIFT);
> >> +    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &xive->tm_mmio_user);
> >> +
> >> +    memory_region_init_io(&xive->tm_mmio_os, OBJECT(xive),
> >> +                          &xive_tm_os_ops, xive, "xive.tima.os",
> >> +                          1ull << TM_SHIFT);
> >> +    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &xive->tm_mmio_os);
> >>  }
> >>  
> >>  static XiveIVE *spapr_xive_get_ive(XiveFabric *xf, uint32_t lisn)
> >> @@ -104,6 +121,13 @@ static XiveIVE *spapr_xive_get_ive(XiveFabric *xf, 
> >> uint32_t lisn)
> >>      return lisn < xive->nr_irqs ? &xive->ivt[lisn] : NULL;
> >>  }
> >>  
> >> +static XiveNVT *spapr_xive_get_nvt(XiveFabric *xf, uint32_t server)
> >> +{
> >> +    PowerPCCPU *cpu = spapr_find_cpu(server);
> >> +
> >> +    return cpu ? XIVE_NVT(cpu->intc) : NULL;
> >> +}
> > 
> > So this is a bit of a tangent, but I've been thinking of implementing
> > a scheme where there's an opaque pointer in the cpu structure for the
> > use of the machine.  I'm planning for that to replace the intc pointer
> > (which isn't really used directly by the cpu). That would allow us to
> > have spapr put a structure there and have both xics and xive pointers
> > which could be useful later on.
> 
> ok. That should simplify the patchset at the end, in which we need to 
> switch the 'intc' pointer. 
> 
> > I think we'd need something similar to correctly handle migration of
> > the VPA state, which is currently horribly broken.
> > 
> >> +
> >>  static const VMStateDescription vmstate_spapr_xive_ive = {
> >>      .name = TYPE_SPAPR_XIVE "/ive",
> >>      .version_id = 1,
> >> @@ -143,6 +167,7 @@ static void spapr_xive_class_init(ObjectClass *klass, 
> >> void *data)
> >>      dc->vmsd = &vmstate_spapr_xive;
> >>  
> >>      xfc->get_ive = spapr_xive_get_ive;
> >> +    xfc->get_nvt = spapr_xive_get_nvt;
> >>  }
> >>  
> >>  static const TypeInfo spapr_xive_info = {
> >> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> >> index dccad0318834..5691bb9474e4 100644
> >> --- a/hw/intc/xive.c
> >> +++ b/hw/intc/xive.c
> >> @@ -14,7 +14,278 @@
> >>  #include "sysemu/cpus.h"
> >>  #include "sysemu/dma.h"
> >>  #include "monitor/monitor.h"
> >> +#include "hw/ppc/xics.h" /* for ICP_PROP_CPU */
> >>  #include "hw/ppc/xive.h"
> >> +#include "hw/ppc/xive_regs.h"
> >> +
> >> +/*
> >> + * XIVE Interrupt Presenter
> >> + */
> >> +
> >> +static uint64_t xive_nvt_accept(XiveNVT *nvt)
> >> +{
> >> +    return 0;
> >> +}
> >> +
> >> +static void xive_nvt_set_cppr(XiveNVT *nvt, uint8_t cppr)
> >> +{
> >> +    if (cppr > XIVE_PRIORITY_MAX) {
> >> +        cppr = 0xff;
> >> +    }
> >> +
> >> +    nvt->ring_os[TM_CPPR] = cppr;
> > 
> > Surely this needs to recheck if we should be interrupting the cpu?
> 
> yes. In patch 9, when we introduce the nvt notify routine.

Ok.

> >> +}
> >> +
> >> +/*
> >> + * OS Thread Interrupt Management Area MMIO
> >> + */
> >> +static uint64_t xive_tm_read_special(XiveNVT *nvt, hwaddr offset,
> >> +                                           unsigned size)
> >> +{
> >> +    uint64_t ret = -1;
> >> +
> >> +    if (offset == TM_SPC_ACK_OS_REG && size == 2) {
> >> +        ret = xive_nvt_accept(nvt);
> >> +    } else {
> >> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid TIMA read @%"
> >> +                      HWADDR_PRIx" size %d\n", offset, size);
> >> +    }
> >> +
> >> +    return ret;
> >> +}
> >> +
> >> +#define TM_RING(offset) ((offset) & 0xf0)
> >> +
> >> +static uint64_t xive_tm_os_read(void *opaque, hwaddr offset,
> >> +                                      unsigned size)
> >> +{
> >> +    PowerPCCPU *cpu = POWERPC_CPU(current_cpu);
> > 
> > So, as I said on a previous version of this, we can actually correctly
> > represent different mappings in different cpu spaces, by exploiting
> > cpu->as and not just having them all point to &address_space_memory.
> 
> Yes, you did and I haven't studied the question yet. For the next version.

So, it's possible that using the cpu->as thing will be more trouble
that it's worth.  I am a little concerned about using current_cpu
though.  First, will it work with KVM with kernel_irqchip=off - the
cpus are running truly concurrently, but we still need to work out
who's poking at the TIMA.  Second, are there any cases where we might
need to trip this "on behalf of" a specific cpu that's not the current
one.

> >> +    XiveNVT *nvt = XIVE_NVT(cpu->intc);
> >> +    uint64_t ret = -1;
> >> +    int i;
> >> +
> >> +    if (offset >= TM_SPC_ACK_EBB) {
> >> +        return xive_tm_read_special(nvt, offset, size);
> >> +    }
> >> +
> >> +    if (TM_RING(offset) != TM_QW1_OS) {
> >> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid access to non-OS 
> >> ring @%"
> >> +                      HWADDR_PRIx"\n", offset);
> >> +        return ret;
> > 
> > Just return -1 would be clearer here;
> 
> ok.
> 
> > 
> >> +    }
> >> +
> >> +    ret = 0;
> >> +    for (i = 0; i < size; i++) {
> >> +        ret |= (uint64_t) nvt->regs[offset + i] << (8 * (size - i - 1));
> >> +    }
> >> +
> >> +    return ret;
> >> +}
> >> +
> >> +static bool xive_tm_is_readonly(uint8_t offset)
> >> +{
> >> +    return offset != TM_QW1_OS + TM_CPPR;
> >> +}
> >> +
> >> +static void xive_tm_write_special(XiveNVT *nvt, hwaddr offset,
> >> +                                        uint64_t value, unsigned size)
> >> +{
> >> +    /* TODO: support TM_SPC_SET_OS_PENDING */
> >> +
> >> +    /* TODO: support TM_SPC_ACK_OS_EL */
> >> +}
> >> +
> >> +static void xive_tm_os_write(void *opaque, hwaddr offset,
> >> +                                   uint64_t value, unsigned size)
> >> +{
> >> +    PowerPCCPU *cpu = POWERPC_CPU(current_cpu);
> >> +    XiveNVT *nvt = XIVE_NVT(cpu->intc);
> >> +    int i;
> >> +
> >> +    if (offset >= TM_SPC_ACK_EBB) {
> >> +        xive_tm_write_special(nvt, offset, value, size);
> >> +        return;
> >> +    }
> >> +
> >> +    if (TM_RING(offset) != TM_QW1_OS) {
> > 
> > Why have this if you have separate OS and user regions as you appear
> > to do below?
> 
> This is another problem we are trying to solve. 
> 
> The registers a CPU can access depends on the TIMA view it is using. 
> The OS TIMA view only sees the OS ring registers. The HV view sees all. 
> 
> > Or to look at it another way, shouldn't it be possible to make the
> > read/write accessors the same for the OS and user rings?
> 
> For some parts yes, but the special load/store addresses are different
> for each view, the read-only register also. It seemed easier to duplicate.
> 
> I think the problem will become clearer (or worse) with pnv which uses 
> the HV mode.

Oh.  I had the impression that each ring had a basically identical set
of registers and you just had access to the region for your ring and
the ones below.  Are you saying instead it's basically a single block
of registers with various different privilege levels for each of them?

[snip]
> >> +}
> >> +
> >> +static void xive_nvt_unrealize(DeviceState *dev, Error **errp)
> >> +{
> >> +    qemu_unregister_reset(xive_nvt_reset, dev);
> >> +}
> >> +
> >> +static void xive_nvt_init(Object *obj)
> >> +{
> >> +    XiveNVT *nvt = XIVE_NVT(obj);
> >> +
> >> +    nvt->ring_os = &nvt->regs[TM_QW1_OS];
> > 
> > The ring_os field is basically pointless, being just an offset into a
> > structure you already have.  A macro or inline would be a better idea.
> 
> ok. I liked the idea but I agree it's overkill to have an init routine
> just for this. I will find something.

That too, but it's also something that looks like an optimization but
isn't, which is bad practice.  On modern cpus math is cheap (and this
is just a trivial offset), memory accesses are expensive.  You're
essentially caching this offset - raising all the usual invalidation
questions for a cache - when caching it is *more* expensive than just
computing it every time.

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]