qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Qemu-ppc] [PATCH 13/13] spapr: add KVM support to the


From: Cédric Le Goater
Subject: Re: [Qemu-devel] [Qemu-ppc] [PATCH 13/13] spapr: add KVM support to the 'dual' machine
Date: Thu, 14 Feb 2019 08:13:26 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0

On 2/14/19 4:35 AM, David Gibson wrote:
> On Wed, Feb 13, 2019 at 11:07:49AM +0100, Greg Kurz wrote:
>> On Wed, 13 Feb 2019 09:22:46 +0100
>> Cédric Le Goater <address@hidden> wrote:
>>
>>> On 2/13/19 2:32 AM, David Gibson wrote:
>>>> On Tue, Feb 12, 2019 at 08:18:19AM +0100, Cédric Le Goater wrote:  
>>>>> On 2/12/19 2:11 AM, David Gibson wrote:  
>>>>>> On Mon, Jan 07, 2019 at 07:39:46PM +0100, Cédric Le Goater wrote:  
>>>>>>> The interrupt mode is chosen by the CAS negotiation process and
>>>>>>> activated after a reset to take into account the required changes in
>>>>>>> the machine. This brings new constraints on how the associated KVM IRQ
>>>>>>> device is initialized.
>>>>>>>
>>>>>>> Currently, each model takes care of the initialization of the KVM
>>>>>>> device in their realize method but this is not possible anymore as the
>>>>>>> initialization needs to be done globaly when the interrupt mode is
>>>>>>> known, i.e. when machine is reseted. It also means that we need a way
>>>>>>> to delete a KVM device when another mode is chosen.
>>>>>>>
>>>>>>> Also, to support migration, the QEMU objects holding the state to
>>>>>>> transfer should always be available but not necessarily activated.
>>>>>>>
>>>>>>> The overall approach of this proposal is to initialize both interrupt
>>>>>>> mode at the QEMU level and keep the IRQ number space in sync to allow
>>>>>>> switching from one mode to another. For the KVM side of things, the
>>>>>>> whole initialization of the KVM device, sources and presenters, is
>>>>>>> grouped in a single routine. The XICS and XIVE sPAPR IRQ reset
>>>>>>> handlers are modified accordingly to handle the init and the delete
>>>>>>> sequences of the KVM device.
>>>>>>>
>>>>>>> As KVM is now initialized at reset, we loose the possiblity to
>>>>>>> fallback to the QEMU emulated mode in case of failure and failures
>>>>>>> become fatal to the machine.
>>>>>>>
>>>>>>> Signed-off-by: Cédric Le Goater <address@hidden>
>>>>>>> ---
>>>>>>>  hw/intc/spapr_xive.c     |  8 +---
>>>>>>>  hw/intc/spapr_xive_kvm.c | 27 ++++++++++++++
>>>>>>>  hw/intc/xics_kvm.c       | 25 +++++++++++++
>>>>>>>  hw/intc/xive.c           |  4 --
>>>>>>>  hw/ppc/spapr_irq.c       | 79 ++++++++++++++++++++++++++++------------
>>>>>>>  5 files changed, 109 insertions(+), 34 deletions(-)
>>>>>>>
>>>>>>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>>>>>>> index 21f3c1ef0901..0661aca35900 100644
>>>>>>> --- a/hw/intc/spapr_xive.c
>>>>>>> +++ b/hw/intc/spapr_xive.c
>>>>>>> @@ -330,13 +330,7 @@ static void spapr_xive_realize(DeviceState *dev, 
>>>>>>> Error **errp)
>>>>>>>      xive->eat = g_new0(XiveEAS, xive->nr_irqs);
>>>>>>>      xive->endt = g_new0(XiveEND, xive->nr_ends);
>>>>>>>  
>>>>>>> -    if (kvmppc_xive_enabled()) {
>>>>>>> -        kvmppc_xive_connect(xive, &local_err);
>>>>>>> -        if (local_err) {
>>>>>>> -            error_propagate(errp, local_err);
>>>>>>> -            return;
>>>>>>> -        }
>>>>>>> -    } else {
>>>>>>> +    if (!kvmppc_xive_enabled()) {
>>>>>>>          /* TIMA initialization */
>>>>>>>          memory_region_init_io(&xive->tm_mmio, OBJECT(xive), 
>>>>>>> &xive_tm_ops, xive,
>>>>>>>                                "xive.tima", 4ull << TM_SHIFT);
>>>>>>> diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
>>>>>>> index d35814c1992e..3ebc947f2be7 100644
>>>>>>> --- a/hw/intc/spapr_xive_kvm.c
>>>>>>> +++ b/hw/intc/spapr_xive_kvm.c
>>>>>>> @@ -737,6 +737,15 @@ void kvmppc_xive_connect(sPAPRXive *xive, Error 
>>>>>>> **errp)
>>>>>>>      Error *local_err = NULL;
>>>>>>>      size_t esb_len;
>>>>>>>      size_t tima_len;
>>>>>>> +    CPUState *cs;
>>>>>>> +
>>>>>>> +    /*
>>>>>>> +     * The KVM XIVE device already in use. This is the case when
>>>>>>> +     * rebooting XIVE -> XIVE  
>>>>>>
>>>>>> Can this case actually occur?  Further down you appear to
>>>>>> unconditionally destroy both KVM devices at reset time.  
>>>>>
>>>>> I guess you are right. I will check.
>>>>>  
>>>>>>> +     */
>>>>>>> +    if (xive->fd != -1) {
>>>>>>> +        return;
>>>>>>> +    }
>>>>>>>  
>>>>>>>      if (!kvm_enabled() || !kvmppc_has_cap_xive()) {
>>>>>>>          error_setg(errp, "IRQ_XIVE capability must be present for 
>>>>>>> KVM");
>>>>>>> @@ -800,6 +809,24 @@ void kvmppc_xive_connect(sPAPRXive *xive, Error 
>>>>>>> **errp)
>>>>>>>      xive->change = qemu_add_vm_change_state_handler(
>>>>>>>          kvmppc_xive_change_state_handler, xive);
>>>>>>>  
>>>>>>> +    /* Connect the presenters to the initial VCPUs of the machine */
>>>>>>> +    CPU_FOREACH(cs) {
>>>>>>> +        PowerPCCPU *cpu = POWERPC_CPU(cs);
>>>>>>> +
>>>>>>> +        kvmppc_xive_cpu_connect(cpu->tctx, &local_err);
>>>>>>> +        if (local_err) {
>>>>>>> +            error_propagate(errp, local_err);
>>>>>>> +            return;
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    /* Update the KVM sources */
>>>>>>> +    kvmppc_xive_source_reset(xsrc, &local_err);
>>>>>>> +    if (local_err) {
>>>>>>> +            error_propagate(errp, local_err);
>>>>>>> +            return;
>>>>>>> +    }
>>>>>>> +
>>>>>>>      kvm_kernel_irqchip = true;
>>>>>>>      kvm_msi_via_irqfd_allowed = true;
>>>>>>>      kvm_gsi_direct_mapping = true;
>>>>>>> diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
>>>>>>> index 1d21ff217b82..bfc35d71df7f 100644
>>>>>>> --- a/hw/intc/xics_kvm.c
>>>>>>> +++ b/hw/intc/xics_kvm.c
>>>>>>> @@ -448,6 +448,16 @@ static void rtas_dummy(PowerPCCPU *cpu, 
>>>>>>> sPAPRMachineState *spapr,
>>>>>>>  int xics_kvm_init(sPAPRMachineState *spapr, Error **errp)
>>>>>>>  {
>>>>>>>      int rc;
>>>>>>> +    CPUState *cs;
>>>>>>> +    Error *local_err = NULL;
>>>>>>> +
>>>>>>> +    /*
>>>>>>> +     * The KVM XICS device already in use. This is the case when
>>>>>>> +     * rebooting XICS -> XICS
>>>>>>> +     */
>>>>>>> +    if (kernel_xics_fd != -1) {
>>>>>>> +        return 0;
>>>>>>> +    }
>>>>>>>  
>>>>>>>      if (!kvm_enabled() || !kvm_check_extension(kvm_state, 
>>>>>>> KVM_CAP_IRQ_XICS)) {
>>>>>>>          error_setg(errp,
>>>>>>> @@ -496,6 +506,21 @@ int xics_kvm_init(sPAPRMachineState *spapr, Error 
>>>>>>> **errp)
>>>>>>>      kvm_msi_via_irqfd_allowed = true;
>>>>>>>      kvm_gsi_direct_mapping = true;
>>>>>>>  
>>>>>>> +    /* Connect the presenters to the initial VCPUs of the machine */
>>>>>>> +    CPU_FOREACH(cs) {
>>>>>>> +        PowerPCCPU *cpu = POWERPC_CPU(cs);
>>>>>>> +
>>>>>>> +        icp_kvm_connect(cpu->icp, &local_err);
>>>>>>> +        if (local_err) {
>>>>>>> +            error_propagate(errp, local_err);
>>>>>>> +            goto fail;
>>>>>>> +        }
>>>>>>> +        icp_set_kvm_state(cpu->icp, 1);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    /* Update the KVM sources */
>>>>>>> +    ics_set_kvm_state(ICS_KVM(spapr->ics), 1);
>>>>>>> +
>>>>>>>      return 0;
>>>>>>>  
>>>>>>>  fail:
>>>>>>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
>>>>>>> index c5c2fbc3f8bc..c166eab5b210 100644
>>>>>>> --- a/hw/intc/xive.c
>>>>>>> +++ b/hw/intc/xive.c
>>>>>>> @@ -932,10 +932,6 @@ static void xive_source_reset(void *dev)
>>>>>>>  
>>>>>>>      /* PQs are initialized to 0b01 (Q=1) which corresponds to "ints 
>>>>>>> off" */
>>>>>>>      memset(xsrc->status, XIVE_ESB_OFF, xsrc->nr_irqs);
>>>>>>> -
>>>>>>> -    if (kvmppc_xive_enabled()) {
>>>>>>> -        kvmppc_xive_source_reset(xsrc, &error_fatal);
>>>>>>> -    }
>>>>>>>  }
>>>>>>>  
>>>>>>>  static void xive_source_realize(DeviceState *dev, Error **errp)
>>>>>>> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
>>>>>>> index ba27d9d8e972..5592eec3787b 100644
>>>>>>> --- a/hw/ppc/spapr_irq.c
>>>>>>> +++ b/hw/ppc/spapr_irq.c
>>>>>>> @@ -98,20 +98,14 @@ static void spapr_irq_init_xics(sPAPRMachineState 
>>>>>>> *spapr, Error **errp)
>>>>>>>      int nr_irqs = spapr->irq->nr_irqs;
>>>>>>>      Error *local_err = NULL;
>>>>>>>  
>>>>>>> -    if (kvm_enabled()) {
>>>>>>> -        if (machine_kernel_irqchip_allowed(machine) &&
>>>>>>> -            !xics_kvm_init(spapr, &local_err)) {
>>>>>>> -            spapr->icp_type = TYPE_KVM_ICP;
>>>>>>> -            spapr->ics = spapr_ics_create(spapr, TYPE_ICS_KVM, nr_irqs,
>>>>>>> -                                          &local_err);
>>>>>>> -        }
>>>>>>> -        if (machine_kernel_irqchip_required(machine) && !spapr->ics) {
>>>>>>> -            error_prepend(&local_err,
>>>>>>> -                          "kernel_irqchip requested but unavailable: 
>>>>>>> ");
>>>>>>> -            goto error;  
>>>>>>
>>>>>> I don't see anything that replaces the irqchip_required logic, which
>>>>>> doesn't seem right.  
>>>>>
>>>>> Yes. We do loose the ability to fall back to the emulated device in case
>>>>> of failure. It is not impossible to do but it will require more changes
>>>>> to check what are the KVM capabilities before starting the machine.  
>>>>
>>>> Uh... it seems more like it's the other way around.  We'll always fall
>>>> back to emulated, even if we've explicitly said on the command line
>>>> that we don't want that.  
>>>
>>> Ah yes. The init function might be also broken. 
>>>
>>> XICS mode is a bit more difficult to handle than XIVE because we have 
>>> different object type for the KVM device and the QEMU emulated device, 
>>
>> This is indeed a bit unfortunate, but I think there's still room for
>> improvement. Let's look at the base classes:
>>
>> struct ICPStateClass {
>>     DeviceClass parent_class;
>>
>>     DeviceRealize parent_realize;
>>     DeviceReset parent_reset;
>>
>>     void (*pre_save)(ICPState *icp);
>>     int (*post_load)(ICPState *icp, int version_id);
>>     void (*synchronize_state)(ICPState *icp);
>> };
>>
>> struct ICSStateClass {
>>     DeviceClass parent_class;
>>
>>     DeviceRealize parent_realize;
>>     DeviceReset parent_reset;
>>
>>     void (*pre_save)(ICSState *s);
>>     int (*post_load)(ICSState *s, int version_id);
>>     void (*reject)(ICSState *s, uint32_t irq);
>>     void (*resend)(ICSState *s);
>>     void (*eoi)(ICSState *s, uint32_t irq);
>>     void (*synchronize_state)(ICSState *s);
>> };
>>
>> The pre_save and post_load callbacks are only used with
>> the KVM device. They could be explicitely called from
>> the corresponding VMStateDescription callbacks with a
>> kvm_enabled() && kvm_irqchip_in_kernel() check.

yes.
>> Same goes for the synchronize_state callbacks, which are only
>> needed for 'info pic'.

yes, like we do for XIVE.

>> The reject, resend and eoi callbacks are only called by code that
>> belongs to the QEMU emulated device. Either the RTAS/hypercalls
>> or from the machine code with explicit checks like:
>>
>> static void spapr_irq_set_irq_xics(void *opaque, int srcno, int val)
>> {
>>     sPAPRMachineState *spapr = opaque;
>>     MachineState *machine = MACHINE(opaque);
>>
>>     if (kvm_enabled() && machine_kernel_irqchip_allowed(machine)) {
>>         ics_kvm_set_irq(spapr->ics, srcno, val);
>>     } else {
>>         ics_simple_set_irq(spapr->ics, srcno, val);
>>     }
>> }
>>
>> or
>>
>> static int spapr_irq_post_load_xics(sPAPRMachineState *spapr, int version_id)
>> {
>>     if (!object_dynamic_cast(OBJECT(spapr->ics), TYPE_ICS_KVM)) {
>>         CPUState *cs;
>>         CPU_FOREACH(cs) {
>>             PowerPCCPU *cpu = POWERPC_CPU(cs);
>>             icp_resend(spapr_cpu_state(cpu)->icp);
>>         }
>>     }
>>     return 0;
>> }
>>
>> Unless I'm missing something, the reject, resend and eoi callbacks could
>> simply be removed. This would allow to unify KVM and QEMU emulation in
>> the same ICP and ICS object types.

yes.  

>> If this makes sense to you, I can have a look (already started actually ;-)
> 
> Please do.  The use of different object types was something that
> seemed like a good idea at the time, but in hindsight, wasn't.  In
> general different device types should represent guest-visibly
> different objects, not just implementation differences.

yes. The direction we took to add KVM support in XIVE proved to be 
much simpler.

>>> and with the 'dual' mode, we activate the device at CAS reset time.
>>>
>>> Failures being handled at reset time, should we keep the same logic and  
>>> abort the machine at reset if the kernel irqchip is required ? 
>>>
>>
>> If the user passed ic-mode=dual,kernel-irqchip=on, we should at least make
>> sure KVM supports both XICS and XIVE devices during machine init. Then
>> during reset if something goes wrong with KVM, it seems ok to abort.
>>
>> If the user didn't pass kernel-irqchip, ie, kernel_irqchip_allowed is true
>> and kernel_irqchip_required is false, the current behavior for XICS is
>> to try KVM first and fallback to QEMU emulation. I guess it could be the
>> same for XIVE.
> 
> Yes, I think that's the behaviour we want, on all counts.
 
Thanks,

C. 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]