qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 02/19] spapr: introduce a skeleton for the XI


From: Cédric Le Goater
Subject: Re: [Qemu-devel] [PATCH v2 02/19] spapr: introduce a skeleton for the XIVE interrupt controller
Date: Thu, 19 Apr 2018 19:40:09 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2

On 04/16/2018 06:26 AM, David Gibson wrote:
> On Thu, Apr 12, 2018 at 10:18:11AM +0200, Cédric Le Goater wrote:
>> On 04/12/2018 07:07 AM, David Gibson wrote:
>>> On Wed, Dec 20, 2017 at 08:38:41AM +0100, Cédric Le Goater wrote:
>>>> On 12/20/2017 06:09 AM, David Gibson wrote:
>>>>> On Sat, Dec 09, 2017 at 09:43:21AM +0100, Cédric Le Goater wrote:
>>>>>> With the POWER9 processor comes a new interrupt controller called
>>>>>> XIVE. It is composed of three sub-engines :
>>>>>>
>>>>>>   - Interrupt Virtualization Source Engine (IVSE). These are in PHBs,
>>>>>>     in the main controller for the IPIS and in the PSI host
>>>>>>     bridge. They are configured to feed the IVRE with events.
>>>>>>
>>>>>>   - Interrupt Virtualization Routing Engine (IVRE). Their job is to
>>>>>>     match an event source with a Notification Virtualization Target
>>>>>>     (NVT), a priority and an Event Queue (EQ) to determine if a
>>>>>>     Virtual Processor can handle the event.
>>>>>>
>>>>>>   - Interrupt Virtualization Presentation Engine (IVPE). It maintains
>>>>>>     the interrupt state of each hardware thread and present the
>>>>>>     notification as an external exception.
>>>>>>
>>>>>> Each of the engines uses a set of internal tables to redirect
>>>>>> exceptions from event sources to CPU threads. The first table we
>>>>>> introduce is the Interrupt Virtualization Entry (IVE) table, part of
>>>>>> the virtualization engine in charge of routing events. It associates
>>>>>> event sources (IRQ numbers) to event queues which will forward, or
>>>>>> not, the event notification to the presentation controller.
>>>>>>
>>>>>> The XIVE model is designed to make use of the full range of the IRQ
>>>>>> number space and does not use an offset like the XICS mode does.
>>>>>> Hence, the IVE table is directly indexed by the IRQ number.
>>>>>>
>>>>>> Signed-off-by: Cédric Le Goater <address@hidden>
>>>>>
>>>>> As you've suggested in yourself, I think we might need to more
>>>>> explicitly model the different components of the XIVE system.  As part
>>>>> of that, I think you need to be clearer in this base skeleton about
>>>>> exactly what component your XIVE object represents.
>>>
>>> Sorry it's been so long since I looked at these.
>>
>> That's fine. I have been working on a XIVE device model for the PowerNV
>> machine and KVM support for the pseries. I have a better understanding
>> of the overall picture.
>>
>> The patchset has not changed much so we can still discuss on this
>> basis without me flooding the mailing list.
>>
>>>> ok. The base skeleton is the IVRE, the central engine handling 
>>>> the routing. 
>>>>
>>>>> If the answer is "the overall thing" 
>>>>
>>>> Yes, it is more or less that currently. 
>>>>
>>>> The sPAPRXive object models the source engine and the routing 
>>>> engine in one object.
>>>
>>> Yeah, I suspect we don't want that.  Although it might seem simpler in
>>> the spapr case, at least at first glance, I think it will cause us
>>> problems later.  At the very least, it's likely to make it harder to
>>> share code between the spapr and powernv case.  I think it will also
>>> make for more confusion about exactly what things belong where.
>>
>> I tend to agree. 
>>
>> We need to clarify (a bit) what is in the XIVE interrupt controller 
>> silicon, and how XIVE works. The XIVE device models for spapr and 
>> powernv should be very close as the differences are small. KVM support 
>> should be built on the spapr model.
>>
>> There are 3 different sub-engines in the XIVE interrupt controller
>> device :
>>
>> * IVSE (XiveSource model)
>>
>>   interrupt sources, which expose their PQ bits through ESB MMIO pages 
>>   (there are different levels of support depending on HW revision) 
>>
>>   The XIVE interrupt controller has a set of internal sources for 
>>   IPIs and CAPI like interrupts.
> 
> Ok.  IIUC in hardware there's one of these in each PHB, 

yes

> plus maybe one or two others.  Is that right?

yes. PSI for instance on PowerNV. I have this device as a first
xive source on Power?V

>>
>> * IVRE (No real model)
>>
>>   in the middle, doing the routing of source event notification to
>>   (cpu) targets. It relies on internal tables which are stored in 
>>   the hypervisor/QEMU/KVM for the spapr machine and in the VM RAM 
>>   for the powernv machine.
> 
> What does VM RAM mean in the powernv context?

The PowerNV is indeed not a VM. So I meant the RAM of the QEMU PowerNV 
machine. skiboot does the allocation and the HW setup using a set of 
IC registers exposed as MMIOs. 

>>   Configuration updates of the XIVE tables are done through hcalls 
>>   on spapr and with MMIOs on the IC regs on powernv. On the latter,
>>   the changes are flushed backed in the VM RAM. 
>>
>> * IVPE (XiveNVT)
>>
>>   set of registers for interrupt management at the CPU level. Exposed
>>   in a specific MMIO region called the TIMA.
> 
> Ok.
> 
>> The XIVE tables are :
>>
>> * IVT
>>
>>   associate an interrupt source number with an event queue. the data
>>   to be pushed in the queue is stored there also.
> 
> Ok, so there would be one of these tables for each IVRE, 

yes. one for each XIVE interrupt controller. That is one per processor 
or socket.

> with one entry for each source managed by that IVSE, yes?

yes. The table is simply indexed by the interrupt number in the
global IRQ number space of the machine.

> Do the XIVE IPIs have entries here, or do they bypass this?

no. The IPIs have entries also in this table.

>> * EQDT:
>>
>>   describes the queues in the OS RAM, also contains a set of flags,
>>   a virtual target, etc.
> 
> So on real hardware this would be global, yes?  And it would be
> consulted by the IVRE?

yes. Exactly. The XIVE routing routine :

        https://github.com/legoater/qemu/blob/xive/hw/intc/xive.c#L706

gives a good overview of the usage of the tables.

> For guests, we'd expect one table per-guest?  

yes but only in emulation mode. 

> How would those be integrated with the host table?

Under KVM, this is handled by the host table (setup done in skiboot) 
and we are only interested in the state of the EQs for migration. 
This state is set  with the H_INT_SET_QUEUE_CONFIG hcall, followed
by an OPAL call and then a HW update. It defines the EQ page in which
to push event notification for the couple server/priority. 

>> * VPDT:
>>
>>   describe the virtual targets, which can have different natures,
>>   a lpar, a cpu. This is for powernv, spapr does not have this 
>>   concept.
> 
> Ok  On hardware that would also be global and consulted by the IVRE,
> yes?

yes. 

> Under PAPR, I'm guessing the concept is missing because it essentially
> has a fixed contents: an entry for each vcpu 

yes.

> and maybe one for the lpar as a whole?
That would be more a host concept. But, yes, it exists in XIVE. 
 
>> So, the idea behind the sPAPRXive object is to model a XIVE interrupt
>> controller device. It contains today :
> 
> Yeah, what a "XIVE interrupt controller device" is not really clear to
> me.  If it's something that is necessarily global, I think you'll be
> better off making it a machine-interface rather than a distinct
> object.

hmm, OK. We do need a XiveSource object (like in the XICS) and an IVE 
table. reshuffling is not a big problem. But then, we also have the
associated KVM device which is very much like the QEMU emulated device.  

>>  - an internal source block for all interrupts : IPIs and virtual 
>>    device interrupts. In the IRQ number space, the IPIs are below
>>    4096 and the device interrupts above, which keeps compatibility 
>>    with XICS. This is important to be able to change interrupt mode.
>>
>>    PowerNV has different source blocks, like for P8.
>>
>>  - a routing engine, which is limited to the IVT. This is a shortcut 
>>    and it might be better to introduce a specific object. Anyhow, this 
>>    is a state to capture.
> 
> Ok.  It sounds like this is roughly the equivalent of the XICSFabric,
> and likewise would probably be better handled by an interface on the
> machine 

yes indeed. it is the case in v3. PowerNV isn't quite in sync with
this concept but it is getting close. 

> rather than a distinct object.  But I'm not clear enough to be
> certain of that yet.

but we need to put the IVT somewhere.

>>    In the current version I am working on, the XiveFabric interface is
>>    more complex :
>>
>>      typedef struct XiveFabricClass {
>>          InterfaceClass parent;
>>          XiveIVE *(*get_ive)(XiveFabric *xf, uint32_t lisn);
> 
> This does an IVT lookup, I take it?

yes. It is an interface for the underlying storage, which is different
in sPAPR and PowerNV. The goal is to make the routing generic. 

>>          XiveNVT *(*get_nvt)(XiveFabric *xf, uint32_t server);
> 
> This one a VPDT lookup, yes?

yes.

>>          XiveEQ  *(*get_eq)(XiveFabric *xf, uint32_t eq_idx);
> 
> And this one an EQDT lookup?

yes.

>>      } XiveFabricClass;
>>
>>    It helps in making the routing algorithm independent of the model. 
>>    I hope to make powernv converge and use it.
>>
>>  - a set of MMIOs for the TIMA. They model the presenter engine. 
>>    current_cpu is used to retrieve the NVT object, which holds the 
>>    registers for interrupt management.  
> 
> Right.  Now the TIMA is local to a target/server not an EQ, right?

The TIMA is the MMIO giving access to the registers which are per CPU. 
The EQ are for routing. They are under the CPU object because it is 
convenient.
 
> I guess we need at least one of these per-vcpu.  

yes.

> Do we also need an lpar-global, or other special ones?

That would be for the host. AFAICT KVM does not use such special
VPs. 

>> The EQs are stored under the NVT. This saves us an unnecessary EQDT 
>> table. But we could add one under the XIVE device model.
> 
> I'm not sure of the distinction you're drawing between the NVT and the
> XIVE device mode.

we could add a new table under the XIVE interrupt device model 
sPAPRXive to store the EQs and indexed them like skiboot does. 
But it seems unnecessary to me as we can use the object below 
'cpu->intc', which is the XiveNVT object.  

C.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]