[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [PATCH 13/13] spapr: add KVM support to the 'dual' machin
From: |
David Gibson |
Subject: |
Re: [Qemu-ppc] [PATCH 13/13] spapr: add KVM support to the 'dual' machine |
Date: |
Thu, 14 Feb 2019 14:35:55 +1100 |
User-agent: |
Mutt/1.10.1 (2018-07-13) |
On Wed, Feb 13, 2019 at 11:07:49AM +0100, Greg Kurz wrote:
> On Wed, 13 Feb 2019 09:22:46 +0100
> Cédric Le Goater <address@hidden> wrote:
>
> > On 2/13/19 2:32 AM, David Gibson wrote:
> > > On Tue, Feb 12, 2019 at 08:18:19AM +0100, Cédric Le Goater wrote:
> > >> On 2/12/19 2:11 AM, David Gibson wrote:
> > >>> On Mon, Jan 07, 2019 at 07:39:46PM +0100, Cédric Le Goater wrote:
> > >>>> The interrupt mode is chosen by the CAS negotiation process and
> > >>>> activated after a reset to take into account the required changes in
> > >>>> the machine. This brings new constraints on how the associated KVM IRQ
> > >>>> device is initialized.
> > >>>>
> > >>>> Currently, each model takes care of the initialization of the KVM
> > >>>> device in their realize method but this is not possible anymore as the
> > >>>> initialization needs to be done globaly when the interrupt mode is
> > >>>> known, i.e. when machine is reseted. It also means that we need a way
> > >>>> to delete a KVM device when another mode is chosen.
> > >>>>
> > >>>> Also, to support migration, the QEMU objects holding the state to
> > >>>> transfer should always be available but not necessarily activated.
> > >>>>
> > >>>> The overall approach of this proposal is to initialize both interrupt
> > >>>> mode at the QEMU level and keep the IRQ number space in sync to allow
> > >>>> switching from one mode to another. For the KVM side of things, the
> > >>>> whole initialization of the KVM device, sources and presenters, is
> > >>>> grouped in a single routine. The XICS and XIVE sPAPR IRQ reset
> > >>>> handlers are modified accordingly to handle the init and the delete
> > >>>> sequences of the KVM device.
> > >>>>
> > >>>> As KVM is now initialized at reset, we loose the possiblity to
> > >>>> fallback to the QEMU emulated mode in case of failure and failures
> > >>>> become fatal to the machine.
> > >>>>
> > >>>> Signed-off-by: Cédric Le Goater <address@hidden>
> > >>>> ---
> > >>>> hw/intc/spapr_xive.c | 8 +---
> > >>>> hw/intc/spapr_xive_kvm.c | 27 ++++++++++++++
> > >>>> hw/intc/xics_kvm.c | 25 +++++++++++++
> > >>>> hw/intc/xive.c | 4 --
> > >>>> hw/ppc/spapr_irq.c | 79 ++++++++++++++++++++++++++++------------
> > >>>> 5 files changed, 109 insertions(+), 34 deletions(-)
> > >>>>
> > >>>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> > >>>> index 21f3c1ef0901..0661aca35900 100644
> > >>>> --- a/hw/intc/spapr_xive.c
> > >>>> +++ b/hw/intc/spapr_xive.c
> > >>>> @@ -330,13 +330,7 @@ static void spapr_xive_realize(DeviceState *dev,
> > >>>> Error **errp)
> > >>>> xive->eat = g_new0(XiveEAS, xive->nr_irqs);
> > >>>> xive->endt = g_new0(XiveEND, xive->nr_ends);
> > >>>>
> > >>>> - if (kvmppc_xive_enabled()) {
> > >>>> - kvmppc_xive_connect(xive, &local_err);
> > >>>> - if (local_err) {
> > >>>> - error_propagate(errp, local_err);
> > >>>> - return;
> > >>>> - }
> > >>>> - } else {
> > >>>> + if (!kvmppc_xive_enabled()) {
> > >>>> /* TIMA initialization */
> > >>>> memory_region_init_io(&xive->tm_mmio, OBJECT(xive),
> > >>>> &xive_tm_ops, xive,
> > >>>> "xive.tima", 4ull << TM_SHIFT);
> > >>>> diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
> > >>>> index d35814c1992e..3ebc947f2be7 100644
> > >>>> --- a/hw/intc/spapr_xive_kvm.c
> > >>>> +++ b/hw/intc/spapr_xive_kvm.c
> > >>>> @@ -737,6 +737,15 @@ void kvmppc_xive_connect(sPAPRXive *xive, Error
> > >>>> **errp)
> > >>>> Error *local_err = NULL;
> > >>>> size_t esb_len;
> > >>>> size_t tima_len;
> > >>>> + CPUState *cs;
> > >>>> +
> > >>>> + /*
> > >>>> + * The KVM XIVE device already in use. This is the case when
> > >>>> + * rebooting XIVE -> XIVE
> > >>>
> > >>> Can this case actually occur? Further down you appear to
> > >>> unconditionally destroy both KVM devices at reset time.
> > >>
> > >> I guess you are right. I will check.
> > >>
> > >>>> + */
> > >>>> + if (xive->fd != -1) {
> > >>>> + return;
> > >>>> + }
> > >>>>
> > >>>> if (!kvm_enabled() || !kvmppc_has_cap_xive()) {
> > >>>> error_setg(errp, "IRQ_XIVE capability must be present for
> > >>>> KVM");
> > >>>> @@ -800,6 +809,24 @@ void kvmppc_xive_connect(sPAPRXive *xive, Error
> > >>>> **errp)
> > >>>> xive->change = qemu_add_vm_change_state_handler(
> > >>>> kvmppc_xive_change_state_handler, xive);
> > >>>>
> > >>>> + /* Connect the presenters to the initial VCPUs of the machine */
> > >>>> + CPU_FOREACH(cs) {
> > >>>> + PowerPCCPU *cpu = POWERPC_CPU(cs);
> > >>>> +
> > >>>> + kvmppc_xive_cpu_connect(cpu->tctx, &local_err);
> > >>>> + if (local_err) {
> > >>>> + error_propagate(errp, local_err);
> > >>>> + return;
> > >>>> + }
> > >>>> + }
> > >>>> +
> > >>>> + /* Update the KVM sources */
> > >>>> + kvmppc_xive_source_reset(xsrc, &local_err);
> > >>>> + if (local_err) {
> > >>>> + error_propagate(errp, local_err);
> > >>>> + return;
> > >>>> + }
> > >>>> +
> > >>>> kvm_kernel_irqchip = true;
> > >>>> kvm_msi_via_irqfd_allowed = true;
> > >>>> kvm_gsi_direct_mapping = true;
> > >>>> diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
> > >>>> index 1d21ff217b82..bfc35d71df7f 100644
> > >>>> --- a/hw/intc/xics_kvm.c
> > >>>> +++ b/hw/intc/xics_kvm.c
> > >>>> @@ -448,6 +448,16 @@ static void rtas_dummy(PowerPCCPU *cpu,
> > >>>> sPAPRMachineState *spapr,
> > >>>> int xics_kvm_init(sPAPRMachineState *spapr, Error **errp)
> > >>>> {
> > >>>> int rc;
> > >>>> + CPUState *cs;
> > >>>> + Error *local_err = NULL;
> > >>>> +
> > >>>> + /*
> > >>>> + * The KVM XICS device already in use. This is the case when
> > >>>> + * rebooting XICS -> XICS
> > >>>> + */
> > >>>> + if (kernel_xics_fd != -1) {
> > >>>> + return 0;
> > >>>> + }
> > >>>>
> > >>>> if (!kvm_enabled() || !kvm_check_extension(kvm_state,
> > >>>> KVM_CAP_IRQ_XICS)) {
> > >>>> error_setg(errp,
> > >>>> @@ -496,6 +506,21 @@ int xics_kvm_init(sPAPRMachineState *spapr, Error
> > >>>> **errp)
> > >>>> kvm_msi_via_irqfd_allowed = true;
> > >>>> kvm_gsi_direct_mapping = true;
> > >>>>
> > >>>> + /* Connect the presenters to the initial VCPUs of the machine */
> > >>>> + CPU_FOREACH(cs) {
> > >>>> + PowerPCCPU *cpu = POWERPC_CPU(cs);
> > >>>> +
> > >>>> + icp_kvm_connect(cpu->icp, &local_err);
> > >>>> + if (local_err) {
> > >>>> + error_propagate(errp, local_err);
> > >>>> + goto fail;
> > >>>> + }
> > >>>> + icp_set_kvm_state(cpu->icp, 1);
> > >>>> + }
> > >>>> +
> > >>>> + /* Update the KVM sources */
> > >>>> + ics_set_kvm_state(ICS_KVM(spapr->ics), 1);
> > >>>> +
> > >>>> return 0;
> > >>>>
> > >>>> fail:
> > >>>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> > >>>> index c5c2fbc3f8bc..c166eab5b210 100644
> > >>>> --- a/hw/intc/xive.c
> > >>>> +++ b/hw/intc/xive.c
> > >>>> @@ -932,10 +932,6 @@ static void xive_source_reset(void *dev)
> > >>>>
> > >>>> /* PQs are initialized to 0b01 (Q=1) which corresponds to "ints
> > >>>> off" */
> > >>>> memset(xsrc->status, XIVE_ESB_OFF, xsrc->nr_irqs);
> > >>>> -
> > >>>> - if (kvmppc_xive_enabled()) {
> > >>>> - kvmppc_xive_source_reset(xsrc, &error_fatal);
> > >>>> - }
> > >>>> }
> > >>>>
> > >>>> static void xive_source_realize(DeviceState *dev, Error **errp)
> > >>>> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> > >>>> index ba27d9d8e972..5592eec3787b 100644
> > >>>> --- a/hw/ppc/spapr_irq.c
> > >>>> +++ b/hw/ppc/spapr_irq.c
> > >>>> @@ -98,20 +98,14 @@ static void spapr_irq_init_xics(sPAPRMachineState
> > >>>> *spapr, Error **errp)
> > >>>> int nr_irqs = spapr->irq->nr_irqs;
> > >>>> Error *local_err = NULL;
> > >>>>
> > >>>> - if (kvm_enabled()) {
> > >>>> - if (machine_kernel_irqchip_allowed(machine) &&
> > >>>> - !xics_kvm_init(spapr, &local_err)) {
> > >>>> - spapr->icp_type = TYPE_KVM_ICP;
> > >>>> - spapr->ics = spapr_ics_create(spapr, TYPE_ICS_KVM,
> > >>>> nr_irqs,
> > >>>> - &local_err);
> > >>>> - }
> > >>>> - if (machine_kernel_irqchip_required(machine) && !spapr->ics) {
> > >>>> - error_prepend(&local_err,
> > >>>> - "kernel_irqchip requested but unavailable:
> > >>>> ");
> > >>>> - goto error;
> > >>>
> > >>> I don't see anything that replaces the irqchip_required logic, which
> > >>> doesn't seem right.
> > >>
> > >> Yes. We do loose the ability to fall back to the emulated device in case
> > >> of failure. It is not impossible to do but it will require more changes
> > >> to check what are the KVM capabilities before starting the machine.
> > >
> > > Uh... it seems more like it's the other way around. We'll always fall
> > > back to emulated, even if we've explicitly said on the command line
> > > that we don't want that.
> >
> > Ah yes. The init function might be also broken.
> >
> > XICS mode is a bit more difficult to handle than XIVE because we have
> > different object type for the KVM device and the QEMU emulated device,
>
> This is indeed a bit unfortunate, but I think there's still room for
> improvement. Let's look at the base classes:
>
> struct ICPStateClass {
> DeviceClass parent_class;
>
> DeviceRealize parent_realize;
> DeviceReset parent_reset;
>
> void (*pre_save)(ICPState *icp);
> int (*post_load)(ICPState *icp, int version_id);
> void (*synchronize_state)(ICPState *icp);
> };
>
> struct ICSStateClass {
> DeviceClass parent_class;
>
> DeviceRealize parent_realize;
> DeviceReset parent_reset;
>
> void (*pre_save)(ICSState *s);
> int (*post_load)(ICSState *s, int version_id);
> void (*reject)(ICSState *s, uint32_t irq);
> void (*resend)(ICSState *s);
> void (*eoi)(ICSState *s, uint32_t irq);
> void (*synchronize_state)(ICSState *s);
> };
>
> The pre_save and post_load callbacks are only used with
> the KVM device. They could be explicitely called from
> the corresponding VMStateDescription callbacks with a
> kvm_enabled() && kvm_irqchip_in_kernel() check.
>
> Same goes for the synchronize_state callbacks, which are only
> needed for 'info pic'.
>
> The reject, resend and eoi callbacks are only called by code that
> belongs to the QEMU emulated device. Either the RTAS/hypercalls
> or from the machine code with explicit checks like:
>
> static void spapr_irq_set_irq_xics(void *opaque, int srcno, int val)
> {
> sPAPRMachineState *spapr = opaque;
> MachineState *machine = MACHINE(opaque);
>
> if (kvm_enabled() && machine_kernel_irqchip_allowed(machine)) {
> ics_kvm_set_irq(spapr->ics, srcno, val);
> } else {
> ics_simple_set_irq(spapr->ics, srcno, val);
> }
> }
>
> or
>
> static int spapr_irq_post_load_xics(sPAPRMachineState *spapr, int version_id)
> {
> if (!object_dynamic_cast(OBJECT(spapr->ics), TYPE_ICS_KVM)) {
> CPUState *cs;
> CPU_FOREACH(cs) {
> PowerPCCPU *cpu = POWERPC_CPU(cs);
> icp_resend(spapr_cpu_state(cpu)->icp);
> }
> }
> return 0;
> }
>
> Unless I'm missing something, the reject, resend and eoi callbacks could
> simply be removed. This would allow to unify KVM and QEMU emulation in
> the same ICP and ICS object types.
>
> If this makes sense to you, I can have a look (already started actually ;-)
Please do. The use of different object types was something that
seemed like a good idea at the time, but in hindsight, wasn't. In
general different device types should represent guest-visibly
different objects, not just implementation differences.
> > and with the 'dual' mode, we activate the device at CAS reset time.
> >
> > Failures being handled at reset time, should we keep the same logic and
> > abort the machine at reset if the kernel irqchip is required ?
> >
>
> If the user passed ic-mode=dual,kernel-irqchip=on, we should at least make
> sure KVM supports both XICS and XIVE devices during machine init. Then
> during reset if something goes wrong with KVM, it seems ok to abort.
>
> If the user didn't pass kernel-irqchip, ie, kernel_irqchip_allowed is true
> and kernel_irqchip_required is false, the current behavior for XICS is
> to try KVM first and fallback to QEMU emulation. I guess it could be the
> same for XIVE.
Yes, I think that's the behaviour we want, on all counts.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature