qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v5 08/36] ppc/xive: introduce a simplified XIVE


From: David Gibson
Subject: Re: [Qemu-devel] [PATCH v5 08/36] ppc/xive: introduce a simplified XIVE presenter
Date: Thu, 29 Nov 2018 11:47:18 +1100
User-agent: Mutt/1.10.1 (2018-07-13)

On Wed, Nov 28, 2018 at 11:59:58AM +0100, Cédric Le Goater wrote:
> On 11/28/18 12:49 AM, David Gibson wrote:
> > On Fri, Nov 16, 2018 at 11:57:01AM +0100, Cédric Le Goater wrote:
> >> The last sub-engine of the XIVE architecture is the Interrupt
> >> Virtualization Presentation Engine (IVPE). On HW, they share elements,
> >> the Power Bus interface (CQ), the routing table descriptors, and they
> >> can be combined in the same HW logic. We do the same in QEMU and
> >> combine both engines in the XiveRouter for simplicity.
> > 
> > Ok, I'm not entirely convinced combining the IVPE and IVRE into a
> > single object is a good idea, but we can probably discuss that once
> > I've read further.
> 
> We could introduce a simplified presenter for sPAPR but I am not even
> sure of that as it will get more complex if we support the EBB one day. 

I wasn't really thinking about PAPR for this comment.

> >> When the IVRE has completed its job of matching an event source with a
> >> Notification Virtual Target (NVT) to notify, it forwards the event
> >> notification to the IVPE sub-engine. The IVPE scans the thread
> >> interrupt contexts of the Notification Virtual Targets (NVT)
> >> dispatched on the HW processor threads and if a match is found, it
> >> signals the thread. If not, the IVPE escalates the notification to
> >> some other targets and records the notification in a backlog queue.
> >>
> >> The IVPE maintains the thread interrupt context state for each of its
> >> NVTs not dispatched on HW processor threads in the Notification
> >> Virtual Target table (NVTT).
> >>
> >> The model currently only supports single NVT notifications.
> >>
> >> Signed-off-by: Cédric Le Goater <address@hidden>
> >> ---
> >>  include/hw/ppc/xive.h      |  13 +++
> >>  include/hw/ppc/xive_regs.h |  22 ++++
> >>  hw/intc/xive.c             | 223 +++++++++++++++++++++++++++++++++++++
> >>  3 files changed, 258 insertions(+)
> >>
> >> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> >> index 5987f26ddb98..e715a6c6923d 100644
> >> --- a/include/hw/ppc/xive.h
> >> +++ b/include/hw/ppc/xive.h
> >> @@ -197,6 +197,10 @@ typedef struct XiveRouterClass {
> >>                     XiveEND *end);
> >>      int (*set_end)(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
> >>                     XiveEND *end);
> >> +    int (*get_nvt)(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> >> +                   XiveNVT *nvt);
> >> +    int (*set_nvt)(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> >> +                   XiveNVT *nvt);
> > 
> > As with the ENDs, I don't think get/set is a good interface for a
> > bigger-than-word-size object.
> 
> We need to agree on this interface before I respin. So you would like 
> to add a extra argument specifying the word being accessed ?

Yes.  Ok, 3 options I can see at this point:

1) read/write accessors which take a word number

2) A "get" accessor which copies the whole structure, but "write"
accessor which takes a word number.  The asymmetry is a bit ugly, but
it's the non-atomic writeback of the whole structure which I'm most
uncomfortable with.

3) A map/unmap interface which gives you / releases a pointer to the
"live" structure.  For powernv that would become
address_space_map()/unmap().  For PAPR it would just be reutn pointer
/ no-op.

> 
> > 
> >>  } XiveRouterClass;
> >>  
> >>  void xive_eas_pic_print_info(XiveEAS *eas, uint32_t lisn, Monitor *mon);
> >> @@ -207,6 +211,10 @@ int xive_router_get_end(XiveRouter *xrtr, uint8_t 
> >> end_blk, uint32_t end_idx,
> >>                          XiveEND *end);
> >>  int xive_router_set_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t 
> >> end_idx,
> >>                          XiveEND *end);
> >> +int xive_router_get_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t 
> >> nvt_idx,
> >> +                        XiveNVT *nvt);
> >> +int xive_router_set_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t 
> >> nvt_idx,
> >> +                        XiveNVT *nvt);
> >>  
> >>  /*
> >>   * XIVE END ESBs
> >> @@ -274,4 +282,9 @@ extern const MemoryRegionOps xive_tm_ops;
> >>  
> >>  void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon);
> >>  
> >> +static inline uint32_t xive_tctx_cam_line(uint8_t nvt_blk, uint32_t 
> >> nvt_idx)
> >> +{
> >> +    return (nvt_blk << 19) | nvt_idx;
> > 
> > I'm guessing this formula is the standard way of combining the NVT
> > block and index into a single word?  
> 
> That number is the VP/NVT identifier which is written in the CAM value. 
> The index is on 19 bits because of the NVT  definition in the END 
> structure. It is being increased to 24 bits on Power10 
> 
> > If so, I think we should
> > standardize on passing a single word "nvt_id" around and only
> > splitting it when we need to use the block separately.  
> 
> This is really the only place where we concatenate the two NVT values,
> block and index. 

Hm, ok.  I know we don't model them (yet, maybe ever) but could
combined values appear in the PowerBUS messages that handle remote
notifications?

> > Same goes for
> > the end_id, assuming there's a standard way of putting that into a
> > single word.  That will address the point I raised earlier about lisn
> > being passed around as a single word, but these later stage ids being
> > split.
> 
> Hmm, I am not sure this is a good option. It is not how the PowerNV 
> model would use it, skiboot is very much aware of these blocks and 
> indexes and for remote accesses chips are identified using the block. 
> I will take a look at it but I am not found of it. I can add helpers 
> in some places though.    

Hm, ok.  Do the block and index appear as an (effectively) single
field in the EAS?

> I agree we have some kind of issue linking the HW model with the sPAPR 
> machine. The guest interface is only  about IRQ numbers, priorities and
> cpu numbers. We really don't care about XIVE blocks and indexes in that 
> case. we can clarify the code by bypassing the XiveRouter interfaces
> to the table and directly use the sPAPR interrupt controller. That 
> should help a bit for the hcalls but we would still have to fill in 
> the EAT and the END with some index values if we want to use the router
> algorithm.

I don't think this is too much of a problem.  These are essentially
machine internal details so we can choose an allocation to suit us.
The obvious one is to put everything in a single block, at least as
long as that won't limit our numbers too much.

> > We'll probably want some inlines or macros to build an
> > nvt/end/lisn/whatever id from block and index as well.
> > 
> >> +}
> >> +
> >>  #endif /* PPC_XIVE_H */
> >> diff --git a/include/hw/ppc/xive_regs.h b/include/hw/ppc/xive_regs.h
> >> index 2e3d6cb507da..05cb992d2815 100644
> >> --- a/include/hw/ppc/xive_regs.h
> >> +++ b/include/hw/ppc/xive_regs.h
> >> @@ -158,4 +158,26 @@ typedef struct XiveEND {
> >>  #define END_W7_F1_LOG_SERVER_ID  PPC_BITMASK32(1, 31)
> >>  } XiveEND;
> >>  
> >> +/* Notification Virtual Target (NVT) */
> >> +typedef struct XiveNVT {
> >> +        uint32_t        w0;
> >> +#define NVT_W0_VALID             PPC_BIT32(0)
> >> +        uint32_t        w1;
> >> +        uint32_t        w2;
> >> +        uint32_t        w3;
> >> +        uint32_t        w4;
> >> +        uint32_t        w5;
> >> +        uint32_t        w6;
> >> +        uint32_t        w7;
> >> +        uint32_t        w8;
> >> +#define NVT_W8_GRP_VALID         PPC_BIT32(0)
> >> +        uint32_t        w9;
> >> +        uint32_t        wa;
> >> +        uint32_t        wb;
> >> +        uint32_t        wc;
> >> +        uint32_t        wd;
> >> +        uint32_t        we;
> >> +        uint32_t        wf;
> >> +} XiveNVT;
> >> +
> >>  #endif /* PPC_XIVE_REGS_H */
> >> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> >> index 4c6cb5d52975..5ba3b06e6e25 100644
> >> --- a/hw/intc/xive.c
> >> +++ b/hw/intc/xive.c
> >> @@ -373,6 +373,32 @@ void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor 
> >> *mon)
> >>      }
> >>  }
> >>  
> >> +/* The HW CAM (23bits) is hardwired to :
> >> + *
> >> + *   0x000||0b1||4Bit chip number||7Bit Thread number.
> >> + *
> >> + * and when the block grouping extension is enabled :
> >> + *
> >> + *   4Bit chip number||0x001||7Bit Thread number.
> >> + */
> >> +static uint32_t tctx_hw_cam_line(bool block_group, uint8_t chip_id, 
> >> uint8_t tid)
> >> +{
> >> +    if (block_group) {
> >> +        return 1 << 11 | (chip_id & 0xf) << 7 | (tid & 0x7f);
> >> +    } else {
> >> +        return (chip_id & 0xf) << 11 | 1 << 7 | (tid & 0x7f);
> >> +    }
> >> +}
> >> +
> >> +static uint32_t xive_tctx_hw_cam_line(XiveTCTX *tctx, bool block_group)
> >> +{
> >> +    PowerPCCPU *cpu = POWERPC_CPU(tctx->cs);
> >> +    CPUPPCState *env = &cpu->env;
> >> +    uint32_t pir = env->spr_cb[SPR_PIR].default_value;
> > 
> > I don't much like reaching into the cpu state itself.  I think a
> > better idea would be to have the TCTX have its HW CAM id set during
> > initialization (via a property) and then use that.  This will mean
> > less mucking about if future cpu revisions don't split the PIR into
> > chip and tid ids in the same way.
> 
> yes good idea. I will see how to handle the block_group boolean. may be we
> can leave it out of the model for now as it is not used.

Yes, it would be nice to leave the block_group stuff as a later
extensions when/if we need it.  If we put it in as a stub and nothing
is using/testing it, it's likely it will be broken if we ever do
actually try to use it.

> 
> > 
> >> +    return tctx_hw_cam_line(block_group, (pir >> 8) & 0xf, pir & 0x7f);
> >> +}
> >> +
> >>  static void xive_tctx_reset(void *dev)
> >>  {
> >>      XiveTCTX *tctx = XIVE_TCTX(dev);
> >> @@ -1013,6 +1039,195 @@ int xive_router_set_end(XiveRouter *xrtr, uint8_t 
> >> end_blk, uint32_t end_idx,
> >>     return xrc->set_end(xrtr, end_blk, end_idx, end);
> >>  }
> >>  
> >> +int xive_router_get_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t 
> >> nvt_idx,
> >> +                        XiveNVT *nvt)
> >> +{
> >> +   XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr);
> >> +
> >> +   return xrc->get_nvt(xrtr, nvt_blk, nvt_idx, nvt);
> >> +}
> >> +
> >> +int xive_router_set_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t 
> >> nvt_idx,
> >> +                        XiveNVT *nvt)
> >> +{
> >> +   XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr);
> >> +
> >> +   return xrc->set_nvt(xrtr, nvt_blk, nvt_idx, nvt);
> >> +}
> >> +
> >> +static bool xive_tctx_ring_match(XiveTCTX *tctx, uint8_t ring,
> >> +                                 uint8_t nvt_blk, uint32_t nvt_idx,
> >> +                                 bool cam_ignore, uint32_t logic_serv)
> >> +{
> >> +    uint8_t *regs = &tctx->regs[ring];
> >> +    uint32_t w2 = be32_to_cpu(*((uint32_t *) &regs[TM_WORD2]));
> >> +    uint32_t cam = xive_tctx_cam_line(nvt_blk, nvt_idx);
> >> +    bool block_group = false; /* TODO (PowerNV) */
> >> +
> >> +    /* TODO (PowerNV): ignore low order bits of nvt id */
> >> +
> >> +    switch (ring) {
> >> +    case TM_QW3_HV_PHYS:
> >> +        return (w2 & TM_QW3W2_VT) && xive_tctx_hw_cam_line(tctx, 
> >> block_group) ==
> >> +            tctx_hw_cam_line(block_group, nvt_blk, nvt_idx);
> > 
> > The difference between "xive_tctx_hw_cam_line" and "tctx_hw_cam_line"
> > here is far from obvious.  
> 
> yes. I lacked inspiration ...

I'd suggest that the one which takes the tctx as a parameter be
tctx_hw_cam_line() and the other be nvt_hw_cam_line() or similar.  The
crucial difference here is that one is what the thread is looking for,
the other is what the NVT is advertising.

> > Remember that namespacing prefixes aren't
> > necessary for static functions, which can let you give more
> > descriptive names without getting excessively long.
> 
> OK.
>  
> >> +    case TM_QW2_HV_POOL:
> >> +        return (w2 & TM_QW2W2_VP) && (cam == GETFIELD(TM_QW2W2_POOL_CAM, 
> >> w2));
> >> +
> >> +    case TM_QW1_OS:
> >> +        return (w2 & TM_QW1W2_VO) && (cam == GETFIELD(TM_QW1W2_OS_CAM, 
> >> w2));
> >> +
> >> +    case TM_QW0_USER:
> >> +        return ((w2 & TM_QW1W2_VO) && (cam == GETFIELD(TM_QW1W2_OS_CAM, 
> >> w2)) &&
> >> +                (w2 & TM_QW0W2_VU) &&
> >> +                (logic_serv == GETFIELD(TM_QW0W2_LOGIC_SERV, w2)));
> >> +
> >> +    default:
> >> +        g_assert_not_reached();
> >> +    }
> >> +}
> >> +
> >> +static int xive_presenter_tctx_match(XiveTCTX *tctx, uint8_t format,
> >> +                                     uint8_t nvt_blk, uint32_t nvt_idx,
> >> +                                     bool cam_ignore, uint32_t logic_serv)
> >> +{
> >> +    if (format == 0) {
> >> +        /* F=0 & i=1: Logical server notification */
> >> +        if (cam_ignore == true) {
> >> +            qemu_log_mask(LOG_GUEST_ERROR, "XIVE: no support for LS "
> >> +                          "NVT %x/%x\n", nvt_blk, nvt_idx);
> >> +             return -1;
> >> +        }
> >> +
> >> +        /* F=0 & i=0: Specific NVT notification */
> >> +        if (xive_tctx_ring_match(tctx, TM_QW3_HV_PHYS,
> >> +                                nvt_blk, nvt_idx, false, 0)) {
> >> +            return TM_QW3_HV_PHYS;
> >> +        }
> >> +        if (xive_tctx_ring_match(tctx, TM_QW2_HV_POOL,
> >> +                                nvt_blk, nvt_idx, false, 0)) {
> >> +            return TM_QW2_HV_POOL;
> >> +        }
> >> +        if (xive_tctx_ring_match(tctx, TM_QW1_OS,
> >> +                                nvt_blk, nvt_idx, false, 0)) {
> >> +            return TM_QW1_OS;
> >> +        }
> > 
> > Hm.  It's a bit pointless to iterate through each ring calling a
> > common function, when that "common" function consists entirely of a
> > switch which makes it not really common at all.
> > 
> > So I think you want separate helper functions for each ring's match,
> > or even just fold the previous function into this one.
> 
> yes. It can be improved. I did try different layouts. I might just fold 
> both routine in one as you propose.  
> 
> >> +    } else {
> >> +        /* F=1 : User level Event-Based Branch (EBB) notification */
> >> +        if (xive_tctx_ring_match(tctx, TM_QW0_USER,
> >> +                                nvt_blk, nvt_idx, false, logic_serv)) {
> >> +            return TM_QW0_USER;
> >> +        }
> >> +    }
> >> +    return -1;
> >> +}
> >> +
> >> +typedef struct XiveTCTXMatch {
> >> +    XiveTCTX *tctx;
> >> +    uint8_t ring;
> >> +} XiveTCTXMatch;
> >> +
> >> +static bool xive_presenter_match(XiveRouter *xrtr, uint8_t format,
> >> +                                 uint8_t nvt_blk, uint32_t nvt_idx,
> >> +                                 bool cam_ignore, uint8_t priority,
> >> +                                 uint32_t logic_serv, XiveTCTXMatch 
> >> *match)
> >> +{
> >> +    CPUState *cs;
> >> +
> >> +    /* TODO (PowerNV): handle chip_id overwrite of block field for
> >> +     * hardwired CAM compares */
> >> +
> >> +    CPU_FOREACH(cs) {
> >> +        PowerPCCPU *cpu = POWERPC_CPU(cs);
> >> +        XiveTCTX *tctx = XIVE_TCTX(cpu->intc);
> >> +        int ring;
> >> +
> >> +        /*
> >> +         * HW checks that the CPU is enabled in the Physical Thread
> >> +         * Enable Register (PTER).
> >> +         */
> >> +
> >> +        /*
> >> +         * Check the thread context CAM lines and record matches. We
> >> +         * will handle CPU exception delivery later
> >> +         */
> >> +        ring = xive_presenter_tctx_match(tctx, format, nvt_blk, nvt_idx,
> >> +                                         cam_ignore, logic_serv);
> >> +        /*
> >> +         * Save the context and follow on to catch duplicates, that we
> >> +         * don't support yet.
> >> +         */
> >> +        if (ring != -1) {
> >> +            if (match->tctx) {
> >> +                qemu_log_mask(LOG_GUEST_ERROR, "XIVE: already found a 
> >> thread "
> >> +                              "context NVT %x/%x\n", nvt_blk, nvt_idx);
> >> +                return false;
> >> +            }
> >> +
> >> +            match->ring = ring;
> >> +            match->tctx = tctx;
> >> +        }
> >> +    }
> >> +
> >> +    if (!match->tctx) {
> >> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: NVT %x/%x is not 
> >> dispatched\n",
> >> +                      nvt_blk, nvt_idx);
> >> +        return false;
> > 
> > Hmm.. this isn't actually an error isn't it? At least not for powernv
> 
> It is on sPAPR, it would mean the END was configured with an unknow CPU. 

Right.

> It is not error on PowerNV, when we support escalations.
> 
> > - that just means the NVT isn't currently dispatched, so we'll need to
> > trigger the escalation interrupt.  
> 
> Yes.
> 
> > Does this get changed later in the series?
> 
> No.

But this code is common to PAPR and powernv, yes, so it will need to?

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]