qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] [PATCH v2 02/19] spapr: introduce a skeleton for the XIVE


From: Cédric Le Goater
Subject: Re: [Qemu-ppc] [PATCH v2 02/19] spapr: introduce a skeleton for the XIVE interrupt controller
Date: Thu, 26 Apr 2018 10:17:13 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2

On 04/26/2018 07:36 AM, David Gibson wrote:
> On Thu, Apr 19, 2018 at 07:40:09PM +0200, Cédric Le Goater wrote:
>> On 04/16/2018 06:26 AM, David Gibson wrote:
>>> On Thu, Apr 12, 2018 at 10:18:11AM +0200, Cédric Le Goater wrote:
>>>> On 04/12/2018 07:07 AM, David Gibson wrote:
>>>>> On Wed, Dec 20, 2017 at 08:38:41AM +0100, Cédric Le Goater wrote:
>>>>>> On 12/20/2017 06:09 AM, David Gibson wrote:
>>>>>>> On Sat, Dec 09, 2017 at 09:43:21AM +0100, Cédric Le Goater
>> wrote:
> [snip]
>>>> The XIVE tables are :
>>>>
>>>> * IVT
>>>>
>>>>   associate an interrupt source number with an event queue. the data
>>>>   to be pushed in the queue is stored there also.
>>>
>>> Ok, so there would be one of these tables for each IVRE, 
>>
>> yes. one for each XIVE interrupt controller. That is one per processor 
>> or socket.
> 
> Ah.. so there can be more than one in a multi-socket system.
>  >>> with one entry for each source managed by that IVSE, yes?
>>
>> yes. The table is simply indexed by the interrupt number in the
>> global IRQ number space of the machine.
> 
> How does that work on a multi-chip machine?  Does each chip just have
> a table for a slice of the global irq number space?

yes. IRQ Allocation is done relative to the chip, each chip having 
a range depending on its block id. XIVE has a concept of block,
which is used in skiboot in a one-to-one relationship with the chip.

>>> Do the XIVE IPIs have entries here, or do they bypass this?
>>
>> no. The IPIs have entries also in this table.
>>
>>>> * EQDT:
>>>>
>>>>   describes the queues in the OS RAM, also contains a set of flags,
>>>>   a virtual target, etc.
>>>
>>> So on real hardware this would be global, yes?  And it would be
>>> consulted by the IVRE?
>>
>> yes. Exactly. The XIVE routing routine :
>>
>>      https://github.com/legoater/qemu/blob/xive/hw/intc/xive.c#L706
>>
>> gives a good overview of the usage of the tables.
>>
>>> For guests, we'd expect one table per-guest?  
>>
>> yes but only in emulation mode. 
> 
> I'm not sure what you mean by this.

I meant the sPAPR QEMU emulation mode. Linux/KVM relies on the overall 
table allocated in OPAL for the system. 

 
>>> How would those be integrated with the host table?
>>
>> Under KVM, this is handled by the host table (setup done in skiboot) 
>> and we are only interested in the state of the EQs for migration.
> 
> This doesn't make sense to me; the guest is able to alter the IVT
> entries, so that configuration must be migrated somehow.

yes. The IVE needs to be migrated. We use get/set KVM ioctls to save 
and restore the value which is cached in the KVM irq state struct 
(server, prio, eq data). no OPAL calls are needed though.
 
>> This state is set  with the H_INT_SET_QUEUE_CONFIG hcall,
> 
> "This state" here meaning IVT entries?

no. The H_INT_SET_QUEUE_CONFIG sets the event queue OS page for a 
server/priority couple. That is where the event queue data is pushed. 

H_INT_SET_SOURCE_CONFIG does the targeting : irq, server, priority,
and the eq data to be pushed in case of an event.
 
>> followed
>> by an OPAL call and then a HW update. It defines the EQ page in which
>> to push event notification for the couple server/priority. 
>>
>>>> * VPDT:
>>>>
>>>>   describe the virtual targets, which can have different natures,
>>>>   a lpar, a cpu. This is for powernv, spapr does not have this 
>>>>   concept.
>>>
>>> Ok  On hardware that would also be global and consulted by the IVRE,
>>> yes?
>>
>> yes.
> 
> Except.. is it actually global, or is there one per-chip/socket?

There is a global VP allocator splitting the ids depending on the
block/chip, but, to be honest, I have not dug in the details

> [snip]
>>>>    In the current version I am working on, the XiveFabric interface is
>>>>    more complex :
>>>>
>>>>    typedef struct XiveFabricClass {
>>>>        InterfaceClass parent;
>>>>        XiveIVE *(*get_ive)(XiveFabric *xf, uint32_t lisn);
>>>
>>> This does an IVT lookup, I take it?
>>
>> yes. It is an interface for the underlying storage, which is different
>> in sPAPR and PowerNV. The goal is to make the routing generic.
> 
> Right.  So, yes, we definitely want a method *somehwere* to do an IVT
> lookup.  I'm not entirely sure where it belongs yet.

Me either. I have stuffed the XiveFabric with all the abstraction 
needed for the moment. 

I am starting to think that there should be an interface to forward 
events and another one to route them. The router being a special case 
of the forwarder, the last one. The "simple" devices, like PSI, should 
only be forwarders for the sources they own but the interrupt controllers 
should be forwarders (they have sources) and also routers.

>>>>        XiveNVT *(*get_nvt)(XiveFabric *xf, uint32_t server);
>>>
>>> This one a VPDT lookup, yes?
>>
>> yes.
>>
>>>>        XiveEQ  *(*get_eq)(XiveFabric *xf, uint32_t eq_idx);
>>>
>>> And this one an EQDT lookup?
>>
>> yes.
>>
>>>>    } XiveFabricClass;
>>>>
>>>>    It helps in making the routing algorithm independent of the model. 
>>>>    I hope to make powernv converge and use it.
>>>>
>>>>  - a set of MMIOs for the TIMA. They model the presenter engine. 
>>>>    current_cpu is used to retrieve the NVT object, which holds the 
>>>>    registers for interrupt management.  
>>>
>>> Right.  Now the TIMA is local to a target/server not an EQ, right?
>>
>> The TIMA is the MMIO giving access to the registers which are per CPU. 
>> The EQ are for routing. They are under the CPU object because it is 
>> convenient.
>>  
>>> I guess we need at least one of these per-vcpu.  
>>
>> yes.
>>
>>> Do we also need an lpar-global, or other special ones?
>>
>> That would be for the host. AFAICT KVM does not use such special
>> VPs.
> 
> Um.. "does not use".. don't we get to decide that?

Well, that part in the specs is still a little obscure for me and 
I am not sure it will fit very well in the Linux/KVM model. It should 
be hidden to the guest anyway and can come in later.

>>>> The EQs are stored under the NVT. This saves us an unnecessary EQDT 
>>>> table. But we could add one under the XIVE device model.
>>>
>>> I'm not sure of the distinction you're drawing between the NVT and the
>>> XIVE device mode.
>>
>> we could add a new table under the XIVE interrupt device model 
>> sPAPRXive to store the EQs and indexed them like skiboot does. 
>> But it seems unnecessary to me as we can use the object below 
>> 'cpu->intc', which is the XiveNVT object.  
> 
> So, basically assuming a fixed set of EQs (one per priority?)

yes. It's easier to capture the state and dump information from
the monitor.

> per CPU for a PAPR guest?  

yes, that's own it works.

> That makes sense (assuming PAPR doesn't provide guest interfaces to 
> ask for something else).

Yes. All hcalls take prio/server parameters and the reserved prio range 
for the platform is in the device tree. 0xFF is a special case to reset 
targeting. 

Thanks,

C.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]