qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 02/19] spapr: introduce a skeleton for the XI


From: Cédric Le Goater
Subject: Re: [Qemu-devel] [PATCH v2 02/19] spapr: introduce a skeleton for the XIVE interrupt controller
Date: Thu, 3 May 2018 10:43:47 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2

On 05/03/2018 04:29 AM, David Gibson wrote:
> On Thu, Apr 26, 2018 at 10:17:13AM +0200, Cédric Le Goater wrote:
>> On 04/26/2018 07:36 AM, David Gibson wrote:
>>> On Thu, Apr 19, 2018 at 07:40:09PM +0200, Cédric Le Goater wrote:
>>>> On 04/16/2018 06:26 AM, David Gibson wrote:
>>>>> On Thu, Apr 12, 2018 at 10:18:11AM +0200, Cédric Le Goater wrote:
>>>>>> On 04/12/2018 07:07 AM, David Gibson wrote:
>>>>>>> On Wed, Dec 20, 2017 at 08:38:41AM +0100, Cédric Le Goater wrote:
>>>>>>>> On 12/20/2017 06:09 AM, David Gibson wrote:
>>>>>>>>> On Sat, Dec 09, 2017 at 09:43:21AM +0100, Cédric Le Goater
>>>> wrote:
>>> [snip]
>>>>>> The XIVE tables are :
>>>>>>
>>>>>> * IVT
>>>>>>
>>>>>>   associate an interrupt source number with an event queue. the data
>>>>>>   to be pushed in the queue is stored there also.
>>>>>
>>>>> Ok, so there would be one of these tables for each IVRE, 
>>>>
>>>> yes. one for each XIVE interrupt controller. That is one per processor 
>>>> or socket.
>>>
>>> Ah.. so there can be more than one in a multi-socket system.
>>>  >>> with one entry for each source managed by that IVSE, yes?
>>>>
>>>> yes. The table is simply indexed by the interrupt number in the
>>>> global IRQ number space of the machine.
>>>
>>> How does that work on a multi-chip machine?  Does each chip just have
>>> a table for a slice of the global irq number space?
>>
>> yes. IRQ Allocation is done relative to the chip, each chip having 
>> a range depending on its block id. XIVE has a concept of block,
>> which is used in skiboot in a one-to-one relationship with the chip.
> 
> Ok.  I'm assuming this block id forms the high(ish) bits of the global
> irq number, yes?

yes. the 8 top bits are reserved, the next 4 bits are for the 
block id, 16 blocks for 16 socket/chips, and the 20 lower bits 
are for the ISN.

>>>>> Do the XIVE IPIs have entries here, or do they bypass this?
>>>>
>>>> no. The IPIs have entries also in this table.
>>>>
>>>>>> * EQDT:
>>>>>>
>>>>>>   describes the queues in the OS RAM, also contains a set of flags,
>>>>>>   a virtual target, etc.
>>>>>
>>>>> So on real hardware this would be global, yes?  And it would be
>>>>> consulted by the IVRE?
>>>>
>>>> yes. Exactly. The XIVE routing routine :
>>>>
>>>>    https://github.com/legoater/qemu/blob/xive/hw/intc/xive.c#L706
>>>>
>>>> gives a good overview of the usage of the tables.
>>>>
>>>>> For guests, we'd expect one table per-guest?  
>>>>
>>>> yes but only in emulation mode. 
>>>
>>> I'm not sure what you mean by this.
>>
>> I meant the sPAPR QEMU emulation mode. Linux/KVM relies on the overall 
>> table allocated in OPAL for the system. 
> 
> Right.. I'm thinking of this from the point of view of the guest
> and/or qemu, rather than from the implementation.  Even if the actual
> storage of the entries is distributed across the host's global table,
> we still logically have a table per guest, right?

Yes. (the XiveSource object would be the table-per-guest and its 
counterpart in KVM: the source block)  

>>>>> How would those be integrated with the host table?
>>>>
>>>> Under KVM, this is handled by the host table (setup done in skiboot) 
>>>> and we are only interested in the state of the EQs for migration.
>>>
>>> This doesn't make sense to me; the guest is able to alter the IVT
>>> entries, so that configuration must be migrated somehow.
>>
>> yes. The IVE needs to be migrated. We use get/set KVM ioctls to save 
>> and restore the value which is cached in the KVM irq state struct 
>> (server, prio, eq data). no OPAL calls are needed though.
> 
> Right.  Again, at this stage I don't particularly care what the
> backend details are - whether the host calls OPAL or whatever.  I'm
> more concerned with the logical model.

ok.

> 
>>>> This state is set  with the H_INT_SET_QUEUE_CONFIG hcall,
>>>
>>> "This state" here meaning IVT entries?
>>
>> no. The H_INT_SET_QUEUE_CONFIG sets the event queue OS page for a 
>> server/priority couple. That is where the event queue data is
>> pushed.
> 
> Ah.  Doesn't that mean the guest *does* effectively have an EQD table,

well, yes, it's behing the hood. but the guest does not know anything 
about the Xive controller internal structures, IVE, EQD, VPD and tables. 
Only OPAL does in fact.

> updated by this call?  

it is indeed the purpose of H_INT_SET_QUEUE_CONFIG

> We'd need to migrate that data as well, 

yes we do and some fields require OPAL support.

> and it's not part of the IVT, right?

yes. The IVT only contains the EQ index, the server/priority tuple used 
for routing.

>> H_INT_SET_SOURCE_CONFIG does the targeting : irq, server, priority,
>> and the eq data to be pushed in case of an event.
> 
> Ok - that's the IVT entries, yes?

yes.


>>>> followed
>>>> by an OPAL call and then a HW update. It defines the EQ page in which
>>>> to push event notification for the couple server/priority. 
>>>>
>>>>>> * VPDT:
>>>>>>
>>>>>>   describe the virtual targets, which can have different natures,
>>>>>>   a lpar, a cpu. This is for powernv, spapr does not have this 
>>>>>>   concept.
>>>>>
>>>>> Ok  On hardware that would also be global and consulted by the IVRE,
>>>>> yes?
>>>>
>>>> yes.
>>>
>>> Except.. is it actually global, or is there one per-chip/socket?
>>
>> There is a global VP allocator splitting the ids depending on the
>> block/chip, but, to be honest, I have not dug in the details
>>
>>> [snip]
>>>>>>    In the current version I am working on, the XiveFabric interface is
>>>>>>    more complex :
>>>>>>
>>>>>>  typedef struct XiveFabricClass {
>>>>>>      InterfaceClass parent;
>>>>>>      XiveIVE *(*get_ive)(XiveFabric *xf, uint32_t lisn);
>>>>>
>>>>> This does an IVT lookup, I take it?
>>>>
>>>> yes. It is an interface for the underlying storage, which is different
>>>> in sPAPR and PowerNV. The goal is to make the routing generic.
>>>
>>> Right.  So, yes, we definitely want a method *somehwere* to do an IVT
>>> lookup.  I'm not entirely sure where it belongs yet.
>>
>> Me either. I have stuffed the XiveFabric with all the abstraction 
>> needed for the moment. 
>>
>> I am starting to think that there should be an interface to forward 
>> events and another one to route them. The router being a special case 
>> of the forwarder, the last one. The "simple" devices, like PSI, should 
>> only be forwarders for the sources they own but the interrupt controllers 
>> should be forwarders (they have sources) and also routers.
> 
> I'm not really clear what you mean by "forward" here.

When a interrupt source is triggered, a notification event can
be generated and forwarded to the XIVE router if the transition 
algo (depending on the PQ bit) lets it through. A forward is 
a simple load of the IRQ number at a specific MMIO address defined
by the main IC.

For QEMU sPAPR, it's a funtion call but for QEMU powernv, it's a
load.

C.


>>>>>>      XiveNVT *(*get_nvt)(XiveFabric *xf, uint32_t server);
>>>>>
>>>>> This one a VPDT lookup, yes?
>>>>
>>>> yes.
>>>>
>>>>>>      XiveEQ  *(*get_eq)(XiveFabric *xf, uint32_t eq_idx);
>>>>>
>>>>> And this one an EQDT lookup?
>>>>
>>>> yes.
>>>>
>>>>>>  } XiveFabricClass;
>>>>>>
>>>>>>    It helps in making the routing algorithm independent of the model. 
>>>>>>    I hope to make powernv converge and use it.
>>>>>>
>>>>>>  - a set of MMIOs for the TIMA. They model the presenter engine. 
>>>>>>    current_cpu is used to retrieve the NVT object, which holds the 
>>>>>>    registers for interrupt management.  
>>>>>
>>>>> Right.  Now the TIMA is local to a target/server not an EQ, right?
>>>>
>>>> The TIMA is the MMIO giving access to the registers which are per CPU. 
>>>> The EQ are for routing. They are under the CPU object because it is 
>>>> convenient.
>>>>  
>>>>> I guess we need at least one of these per-vcpu.  
>>>>
>>>> yes.
>>>>
>>>>> Do we also need an lpar-global, or other special ones?
>>>>
>>>> That would be for the host. AFAICT KVM does not use such special
>>>> VPs.
>>>
>>> Um.. "does not use".. don't we get to decide that?
>>
>> Well, that part in the specs is still a little obscure for me and 
>> I am not sure it will fit very well in the Linux/KVM model. It should 
>> be hidden to the guest anyway and can come in later.
>>
>>>>>> The EQs are stored under the NVT. This saves us an unnecessary EQDT 
>>>>>> table. But we could add one under the XIVE device model.
>>>>>
>>>>> I'm not sure of the distinction you're drawing between the NVT and the
>>>>> XIVE device mode.
>>>>
>>>> we could add a new table under the XIVE interrupt device model 
>>>> sPAPRXive to store the EQs and indexed them like skiboot does. 
>>>> But it seems unnecessary to me as we can use the object below 
>>>> 'cpu->intc', which is the XiveNVT object.  
>>>
>>> So, basically assuming a fixed set of EQs (one per priority?)
>>
>> yes. It's easier to capture the state and dump information from
>> the monitor.
>>
>>> per CPU for a PAPR guest?  
>>
>> yes, that's own it works.
>>
>>> That makes sense (assuming PAPR doesn't provide guest interfaces to 
>>> ask for something else).
>>
>> Yes. All hcalls take prio/server parameters and the reserved prio range 
>> for the platform is in the device tree. 0xFF is a special case to reset 
>> targeting. 
>>
>> Thanks,
>>
>> C.
>>
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]