qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [PATCH v7 00/19] ppc: support for the XIVE interrupt contro


From: Cédric Le Goater
Subject: [Qemu-devel] [PATCH v7 00/19] ppc: support for the XIVE interrupt controller (POWER9)
Date: Sun, 9 Dec 2018 20:45:51 +0100

Hello,

Here is the version 7 of the QEMU models adding support for the XIVE
interrupt controller to the sPAPR machine, under TCG only this
time. KVM support will be proposed in an other patchset, along with
the KVM XIVE device patchset, and so will PowerNV.

The most important changes for sPAPR are the introduction of the 4.0
machines. The sPAPRXive model still inherits from the XiveRouter. It
is possible to change the class inheritance tree but it does not bring
much for now.

I am not sure how we should handle the machine definitions, so I
proposed both, XIVE only and dual interrupt mode. The impact on the
XICS machine is limited with TCG but KVM support of the 'dual' machine
will change things. Let me know how you want to proceed.

Thanks,

C.

Changes in v7 :

 Common XIVE models :
 
 - removed the 'chip-id' field from XiveRouter
 - introduced a 'block-id' field in XiveENDSource to lookup the XIVE
   END structure when doing a load in the MMIO ESB
 - removed reset XiveENDSource handler
 - introduced a xive_tctx_word2() helper to extract TM_WORD2 of a ring.
 - removed HW CAM line setting and use as it is only useful for PowerNV
 - made use of xive_tctx_word2() helper
 - made use of GETFIELD_BE32() to compare CAM lines
 - fixed initialization of XiveTCTXMatch

 sPAPR models :

 - simplified the prototypes of helpers
 - introduced an assert in set_nvt() method
 - introduced a fixed value for the controller block id value.
 - removed the hardwiring the HW CAM line. Back to v5 state.
 - removed patch "spapr: modify the irq backend 'init' method". It did
   not bring much
 - split the 'xive' sPAPR IRQ backend from the 'xive' machine
 - split the 'dual' sPAPR IRQ backend from the 'dual' machine
 - introduced 4.0* machines
 
 KVM :

 - hardly no changes 
 - will come later in a KVM patchset

 PowerNV:

 - will come later in a PowerNV patchset

Changes in v6 :

 https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg00965.html

Changes in v5 :

 https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg03218.html

Changes in v4 :

 https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg01672.html


= XIVE =================================================================


The POWER9 processor comes with a new interrupt controller, called
XIVE as "eXternal Interrupt Virtualization Engine".


* Overall architecture


             XIVE Interrupt Controller
             +------------------------------------+      IPIs
             | +---------+ +---------+ +--------+ |    +-------+
             | |VC       | |CQ       | |PC      |----> | CORES |
             | |     esb | |         | |        |----> |       |
             | |     eas | |  Bridge | |   tctx |----> |       |
             | |SC   end | |         | |    nvt | |    |       |
 +------+    | +---------+ +----+----+ +--------+ |    +-+-+-+-+
 | RAM  |    +------------------|-----------------+      | | |
 |      |                       |                        | | |
 |      |                       |                        | | |
 |      |  +--------------------v------------------------v-v-v--+    other
 |      <--+                     Power Bus                      +--> chips
 |  esb |  +---------+-----------------------+------------------+
 |  eas |            |                       |
 |  end |         +--|------+                |
 |  nvt |       +----+----+ |           +----+----+
 +------+       |SC       | |           |SC       |
                |         | |           |         |
                | PQ-bits | |           | PQ-bits |
                | local   |-+           |  in VC  |
                +---------+             +---------+
                   PCIe                 NX,NPU,CAPI

                  SC: Source Controller (aka. IVSE)
                  VC: Virtualization Controller (aka. IVRE)
                  PC: Presentation Controller (aka. IVPE)
                  CQ: Common Queue (Bridge)

             PQ-bits: 2 bits source state machine (P:pending Q:queued)
                 esb: Event State Buffer (Array of PQ bits in an IVSE)
                 eas: Event Assignment Structure
                 end: Event Notification Descriptor
                 nvt: Notification Virtual Target
                tctx: Thread interrupt Context


The XIVE IC is composed of three sub-engines :

  - Interrupt Virtualization Source Engine (IVSE), or Source
    Controller (SC). These are found in PCI PHBs, in the PSI host
    bridge controller, but also inside the main controller for the
    core IPIs and other sub-chips (NX, CAP, NPU) of the
    chip/processor. They are configured to feed the IVRE with events.

  - Interrupt Virtualization Routing Engine (IVRE) or Virtualization
    Controller (VC). Its job is to match an event source with an Event
    Notification Descriptor (END).

  - Interrupt Virtualization Presentation Engine (IVPE) or Presentation
    Controller (PC). It maintains the interrupt context state of each
    thread and handles the delivery of the external exception to the
    thread.


* XIVE internal tables

Each of the sub-engines uses a set of tables to redirect exceptions
from event sources to CPU threads.

                                          +-------+
  User or OS                              |  EQ   |
      or                          +------>|entries|
  Hypervisor                      |       |  ..   |
    Memory                        |       +-------+
                                  |           ^
                                  |           |
             +-------------------------------------------------+
                                  |           |
  Hypervisor      +------+    +---+--+    +---+--+   +------+
    Memory        | ESB  |    | EAT  |    | ENDT |   | NVTT |
   (skiboot)      +----+-+    +----+-+    +----+-+   +------+
                    ^  |        ^  |        ^  |       ^
                    |  |        |  |        |  |       |
             +-------------------------------------------------+
                    |  |        |  |        |  |       |
                    |  |        |  |        |  |       |
               +----|--|--------|--|--------|--|-+   +-|-----+    +------+
               |    |  |        |  |        |  | |   | | tctx|    |Thread|
   IPI or   ---+    +  v        +  v        +  v |---| +  .. |----->     |
  HW events    |                                 |   |       |    |      |
               |             IVRE                |   | IVPE  |    +------+
               +---------------------------------+   +-------+
            


The IVSE have a 2-bits state machine, P for pending and Q for queued,
for each source that allows events to be triggered. They are stored in
an Event State Buffer (ESB) array and can be controlled by MMIOs.

If the event is let through, the IVRE looks up in the Event Assignment
Structure (EAS) table for an Event Notification Descriptor (END)
configured for the source. Each Event Notification Descriptor defines
a notification path to a CPU and an in-memory Event Queue, in which
will be enqueued an EQ data for the OS to pull.

The IVPE determines if a Notification Virtual Target (NVT) can handle
the event by scanning the thread contexts of the VCPUs dispatched on
the processor HW threads. It maintains the interrupt context state of
each thread in a NVT table.


* Overview of the QEMU models for the XIVE sub-engines

The XiveSource models the IVSE in general, internal and external. It
handles the source ESBs and the MMIO interface to control them.

The XiveNotifier is a small helper interface interconnecting the
XiveSource to the XiveRouter.

The XiveRouter is an abstract model acting as a combined IVRE and
IVPE. It routes event notifications using the IVE and EQD tables to
the IVPE sub-engine which does a CAM scan to find a CPU to deliver the
exception. Storage should be provided by the inheriting classes.

XiveENDSource is a special source object. It exposes the EQ ESB MMIOs of
the Event Queues which are used for coalescing event notifications and
for escalation. Not used on the field, only to sync the EQ cache in
OPAL.

Finally, the XiveTCTX contains the interrupt state context of a thread,
four sets of registers, one for each exception that can be delivered
to a CPU. These contexts are scanned by the IVPE to find a matching VP
when a notification is triggered. It also models the Thread Interrupt
Management Area (TIMA), which exposes the thread context registers to
the CPU for interrupt management.


* XIVE for sPAPR

sPAPRXive models the XIVE interrupt controller of a sPAPR machine. It
inherits from the XiveRouter and provisions storage for the IVE and
END tables. The NVT table does not need a backend in sPAPR. It owns a
XiveSource object for the IPIs and the virtual device interrupts, a
memory region for the TIMA and a XiveENDSource to manage the END ESBs.
(not used by Linux).

These choices were made to have a sPAPR interrupt controller
consistent with the one found on baremetal and to facilitate KVM
support, the main difficulty being the host memory regions exposed to
the guest.

The NVT and tbe END indexing needs some care and a set of helpers are
defined to ease the conversion between the CPU id as seen by the guest
and the XIVE identifiers manipulated by the models. 


* Integration in the sPAPR machine, xive only and dual

A new sPAPR IRQ backend is defined for XIVE. It introduces a couple of
new operations to handle the differences in the creation of the device
tree and in the allocation of the CPU interrupt controller. A new
'xive' only pseries machine is defined using this XIVE backend.

Being able to support both interrupt mode in the same machine requires
some more changes. As the machine chooses the interrupt mode at CAS
time, it is activated after a reconfiguration done in a reset. This is
handled by a new 'dual' sPAPR IRQ backend which is built on top of the
XICS and XIVE backend. A new 'dual' pseries machine is defined using
this backend.


* KVM support

Support for KVM introduces a set of specific XIVE models, very much
like XICS does, which self-connect to their KVM counterparts in the
Linux kernel. Two host memory regions are exposed to the guest and
need special care at initialization :

  - ESB mmios
  - Thread Interrupt Management Area (TIMA)

The models uses KVM accessors to synchronize the QEMU state with
KVM. The states are :

  - the source configuration (EAT)
  - the END configuration (ENDT)
  - the OS EQ state (toggle bit and index)
  - the thread interrupt context registers.

Hybrid guest using KVM and an emulated irqchip (kernel_irqchip=off) is
supported. Migration under KVM is supported.

KVM support for the 'dual' machine required some more changes. Both
interrupt mode need to be initialized at the QEMU level to keep the
IRQ number space in sync and to allow switching from one mode to
another. At the KVM level, the whole initialization of the KVM device,
sources and presenters, needs to be done in the reset handler when the
interrupt mode is chosen. This is a major change in the KVM models.

KVM being initialized at reset, we loose the possiblity to fallback to
the QEMU emulated mode in case of failure and failures become fatal to
the machine.


* PowerNV models

The PnvXIVE model uses the XiveRouter abstract model just like
sPAPRXive. It provides accessors to the EAS, END and NVT tables which
are stored in the QEMU PowerNV machine and not in QEMU anymore. It
owns a set of memory regions for the IC registers, the ESBs, the END
ESBs, the TIMA, the notification MMIO.

Multichip is supported and the available IVSEs are the internal one
for the IPIS, the PSI host bridge controller and PHB4.

The next interesting step would be to add escalation events and model
the VCPU dispatching to support emulated KVM guests.


* GitHub trees
 
QEMU sPAPR:

  https://github.com/legoater/qemu/commits/xive-v7-3.1
  
QEMU PowerNV:

  https://github.com/legoater/qemu/commits/powernv-3.1

Linux/KVM:

  https://github.com/legoater/linux/commits/xive-4.20

OPAL:

  https://github.com/legoater/skiboot/commits/xive

Cédric Le Goater (19):
  ppc/xive: add support for the END Event State Buffers
  ppc/xive: introduce the XIVE interrupt thread context
  ppc/xive: introduce a simplified XIVE presenter
  ppc/xive: notify the CPU when the interrupt priority is more
    privileged
  spapr/xive: introduce a XIVE interrupt controller
  spapr/xive: use the VCPU id as a NVT identifier
  spapr: introduce a new machine IRQ backend for XIVE
  spapr: add hcalls support for the XIVE exploitation interrupt mode
  spapr: add device tree support for the XIVE exploitation mode
  spapr: allocate the interrupt thread context under the CPU core
  spapr: extend the sPAPR IRQ backend for XICS migration
  spapr: add a 'reset' method to the sPAPR IRQ backend
  spapr: add an extra OV5 field to the sPAPR IRQ backend
  spapr: set the interrupt presenter at reset
  spapr/xive: enable XIVE MMIOs at reset
  spapr: introduce a new sPAPR IRQ backend supporting XIVE and XICS
  spapr: Add a pseries-4.0 machine type
  spapr: add a 'pseries-4.0-xive' machine type
  spapr: add a 'pseries-4.0-dual' machine type

 default-configs/ppc64-softmmu.mak |    1 +
 include/hw/compat.h               |    3 +
 include/hw/ppc/spapr.h            |   23 +-
 include/hw/ppc/spapr_cpu_core.h   |    2 +
 include/hw/ppc/spapr_irq.h        |   12 +
 include/hw/ppc/spapr_xive.h       |   53 ++
 include/hw/ppc/xics.h             |    4 +-
 include/hw/ppc/xive.h             |   80 ++
 include/hw/ppc/xive_regs.h        |  106 +++
 hw/intc/spapr_xive.c              | 1480 +++++++++++++++++++++++++++++
 hw/intc/xics_spapr.c              |    3 +-
 hw/intc/xive.c                    |  883 ++++++++++++++++-
 hw/ppc/spapr.c                    |  100 +-
 hw/ppc/spapr_cpu_core.c           |   31 +-
 hw/ppc/spapr_hcall.c              |   13 +
 hw/ppc/spapr_irq.c                |  350 +++++++
 hw/intc/Makefile.objs             |    1 +
 17 files changed, 3119 insertions(+), 26 deletions(-)
 create mode 100644 include/hw/ppc/spapr_xive.h
 create mode 100644 hw/intc/spapr_xive.c

-- 
2.17.2




reply via email to

[Prev in Thread] Current Thread [Next in Thread]