qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 5/5] spapr: fix migration of ICPState objects


From: Greg Kurz
Subject: Re: [Qemu-devel] [PATCH v3 5/5] spapr: fix migration of ICPState objects from/to older QEMU
Date: Thu, 8 Jun 2017 11:54:10 +0200

On Thu, 8 Jun 2017 14:08:57 +1000
David Gibson <address@hidden> wrote:

> On Wed, Jun 07, 2017 at 07:17:26PM +0200, Greg Kurz wrote:
> > Commit 5bc8d26de20c ("spapr: allocate the ICPState object from under
> > sPAPRCPUCore") moved ICPState objects from the machine to CPU cores.
> > This is an improvement since we no longer allocate ICPState objects
> > that will never be used. But it has the side-effect of breaking
> > migration of older machine types from older QEMU versions.
> > 
> > This patch allows spapr to register dummy "icp/server" entries to vmstate.
> > These entries use a dedicated VMStateDescription that can swallow and
> > discard state of an incoming migration stream, and that don't send anything
> > on outgoing migration.
> > 
> > As for real ICPState objects, the instance_id is the cpu_index of the
> > corresponding vCPU, which happens to be equal to the generated instance_id
> > of older machine types.
> > 
> > The machine can unregister/register these entries when CPUs are dynamically
> > plugged/unplugged.
> > 
> > This is only available for pseries-2.9 and older machines, thanks to a
> > compat property.
> > 
> > Signed-off-by: Greg Kurz <address@hidden>
> > ---
> > v3: - new logic entirely implemented in hw/ppc/spapr.c
> > ---
> >  hw/ppc/spapr.c         |   88 
> > +++++++++++++++++++++++++++++++++++++++++++++++-
> >  include/hw/ppc/spapr.h |    2 +
> >  2 files changed, 88 insertions(+), 2 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 9b7ae28939a8..c15b604978f0 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -124,9 +124,52 @@ error:
> >      return NULL;
> >  }
> >  
> > +static bool pre_2_10_vmstate_dummy_icp_needed(void *opaque)
> > +{
> > +    return false;
> > +}  
> 
> Uh.. the needed function always returns false, how can that work?
> 

The needed function is used for outgoing migration only:

bool vmstate_save_needed(const VMStateDescription *vmsd, void *opaque)
{
    if (vmsd->needed && !vmsd->needed(opaque)) {
        /* optional section not needed */
        return false;
    }
    return true;
}

The idea is that all ICPState objects that were created but not associated
to a vCPU by pre-2.10 machine types don't need to be migrated at all, as
their state hasn't changed.

We don't even create these unneeded ICPState objects here, but simply
register their ids to vmstate.

> > +
> > +static const VMStateDescription pre_2_10_vmstate_dummy_icp = {
> > +    .name = "icp/server",
> > +    .version_id = 1,
> > +    .minimum_version_id = 1,
> > +    .needed = pre_2_10_vmstate_dummy_icp_needed,

Outgoing migration:
- machine in older QEMU have unused ICPState objects (default state)
- machine in QEMU 2.10 doesn't even have extra ICPState objects

=> don't send anything

> > +    .fields = (VMStateField[]) {
> > +        VMSTATE_UNUSED(4), /* uint32_t xirr */
> > +        VMSTATE_UNUSED(1), /* uint8_t pending_priority */
> > +        VMSTATE_UNUSED(1), /* uint8_t mfrr */

Incoming migration from older QEMU: we don't have the extra ICPState objects.

=> accept the state and discard it

> > +        VMSTATE_END_OF_LIST()
> > +    },
> > +};
> > +
> > +static void pre_2_10_vmstate_register_dummy_icp(sPAPRMachineState *spapr, 
> > int i)
> > +{
> > +    bool *flag = &spapr->pre_2_10_ignore_icp[i];
> > +
> > +    g_assert(!*flag);  
> 
> Apart from this assert(), you never seem to test the values in the
> pre_2_10_ignore_icp() array, so it seems a bit pointless.
> 

There's the opposite check in pre_2_10_vmstate_unregister_dummy_icp().
But I agree it isn't really useful... but more because of paranoia :)

> > +    vmstate_register(NULL, i, &pre_2_10_vmstate_dummy_icp, flag);
> > +    *flag = true;
> > +}
> > +
> > +static void pre_2_10_vmstate_unregister_dummy_icp(sPAPRMachineState *spapr,
> > +                                                  int i)
> > +{
> > +    bool *flag = &spapr->pre_2_10_ignore_icp[i];
> > +
> > +    g_assert(*flag);
> > +    vmstate_unregister(NULL, &pre_2_10_vmstate_dummy_icp, flag);
> > +    *flag = false;
> > +}
> > +
> > +static inline int xics_nr_servers(void)  
> 
> Maybe a different name to emphasise that this is only used for the
> backwards compat logic.
> 

It is also used to compute the "ibm,interrupt-server-ranges" DT prop.

    /* /interrupt controller */
    spapr_dt_xics(xics_nr_servers(), fdt, PHANDLE_XICP);


> > +{
> > +    return DIV_ROUND_UP(max_cpus * kvmppc_smt_threads(), smp_threads);
> > +}
> > +
> >  static void xics_system_init(MachineState *machine, int nr_irqs, Error 
> > **errp)
> >  {
> >      sPAPRMachineState *spapr = SPAPR_MACHINE(machine);
> > +    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(machine);
> >  
> >      if (kvm_enabled()) {
> >          if (machine_kernel_irqchip_allowed(machine) &&
> > @@ -148,6 +191,15 @@ static void xics_system_init(MachineState *machine, 
> > int nr_irqs, Error **errp)
> >              return;
> >          }
> >      }
> > +
> > +    if (smc->pre_2_10_has_unused_icps) {
> > +        int i;
> > +
> > +        spapr->pre_2_10_ignore_icp = g_malloc(xics_nr_servers());
> > +        for (i = 0; i < xics_nr_servers(); i++) {
> > +            pre_2_10_vmstate_register_dummy_icp(spapr, i);  
> 
> This registers a dummy ICP for every slot, some of which will have
> real cpus / icps.  That doesn't seem right.
> 

This is initialization, before we even have actual CPUs. We start with
dummy ICPs for every slot, but they get replaced by real ICPs when we
plug CPU cores...... (see below)

> > +        }
> > +    }
> >  }
> >  
> >  static int spapr_fixup_cpu_smt_dt(void *fdt, int offset, PowerPCCPU *cpu,
> > @@ -976,7 +1028,6 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
> >      void *fdt;
> >      sPAPRPHBState *phb;
> >      char *buf;
> > -    int smt = kvmppc_smt_threads();
> >  
> >      fdt = g_malloc0(FDT_MAX_SIZE);
> >      _FDT((fdt_create_empty_tree(fdt, FDT_MAX_SIZE)));
> > @@ -1016,7 +1067,7 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
> >      _FDT(fdt_setprop_cell(fdt, 0, "#size-cells", 2));
> >  
> >      /* /interrupt controller */
> > -    spapr_dt_xics(DIV_ROUND_UP(max_cpus * smt, smp_threads), fdt, 
> > PHANDLE_XICP);
> > +    spapr_dt_xics(xics_nr_servers(), fdt, PHANDLE_XICP);
> >  
> >      ret = spapr_populate_memory(spapr, fdt);
> >      if (ret < 0) {
> > @@ -2800,9 +2851,24 @@ static void spapr_core_unplug(HotplugHandler 
> > *hotplug_dev, DeviceState *dev,
> >                                Error **errp)
> >  {
> >      MachineState *ms = MACHINE(qdev_get_machine());
> > +    sPAPRMachineState *spapr = SPAPR_MACHINE(ms);
> >      CPUCore *cc = CPU_CORE(dev);
> >      CPUArchId *core_slot = spapr_find_cpu_slot(ms, cc->core_id, NULL);
> >  
> > +    if (spapr->pre_2_10_ignore_icp) {
> > +        sPAPRCPUCore *sc = SPAPR_CPU_CORE(OBJECT(dev));
> > +        sPAPRCPUCoreClass *scc = SPAPR_CPU_CORE_GET_CLASS(OBJECT(cc));
> > +        const char *typename = object_class_get_name(scc->cpu_class);
> > +        size_t size = object_type_get_instance_size(typename);
> > +        int i;
> > +
> > +        for (i = 0; i < cc->nr_threads; i++) {
> > +            CPUState *cs = CPU(sc->threads + i * size);
> > +
> > +            pre_2_10_vmstate_register_dummy_icp(spapr, cs->cpu_index);
> > +        }
> > +    }
> > +
> >      assert(core_slot);
> >      core_slot->cpu = NULL;
> >      object_unparent(OBJECT(dev));
> > @@ -2912,6 +2978,21 @@ static void spapr_core_plug(HotplugHandler 
> > *hotplug_dev, DeviceState *dev,
> >          }
> >      }
> >      core_slot->cpu = OBJECT(dev);
> > +
> > +    if (spapr->pre_2_10_ignore_icp) {
> > +        sPAPRCPUCoreClass *scc = SPAPR_CPU_CORE_GET_CLASS(OBJECT(cc));
> > +        const char *typename = object_class_get_name(scc->cpu_class);
> > +        size_t size = object_type_get_instance_size(typename);
> > +        int i;
> > +
> > +        for (i = 0; i < cc->nr_threads; i++) {
> > +            sPAPRCPUCore *sc = SPAPR_CPU_CORE(dev);
> > +            void *obj = sc->threads + i * size;
> > +
> > +            cs = CPU(obj);
> > +            pre_2_10_vmstate_unregister_dummy_icp(spapr, cs->cpu_index);

...... here.

The opposite happens in spapr_core_unplug().

> > +        }
> > +    }
> >  }
> >  
> >  static void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState 
> > *dev,
> > @@ -3361,9 +3442,12 @@ static void 
> > spapr_machine_2_9_instance_options(MachineState *machine)
> >  
> >  static void spapr_machine_2_9_class_options(MachineClass *mc)
> >  {
> > +    sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
> > +
> >      spapr_machine_2_10_class_options(mc);
> >      SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_2_9);
> >      mc->numa_auto_assign_ram = numa_legacy_auto_assign_ram;
> > +    smc->pre_2_10_has_unused_icps = true;
> >  }
> >  
> >  DEFINE_SPAPR_MACHINE(2_9, "2.9", false);
> > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > index f973b0284596..64382623199d 100644
> > --- a/include/hw/ppc/spapr.h
> > +++ b/include/hw/ppc/spapr.h
> > @@ -53,6 +53,7 @@ struct sPAPRMachineClass {
> >      bool dr_lmb_enabled;       /* enable dynamic-reconfig/hotplug of LMBs 
> > */
> >      bool use_ohci_by_default;  /* use USB-OHCI instead of XHCI */
> >      const char *tcg_default_cpu; /* which (TCG) CPU to simulate by default 
> > */
> > +    bool pre_2_10_has_unused_icps;
> >      void (*phb_placement)(sPAPRMachineState *spapr, uint32_t index,
> >                            uint64_t *buid, hwaddr *pio, 
> >                            hwaddr *mmio32, hwaddr *mmio64,
> > @@ -90,6 +91,7 @@ struct sPAPRMachineState {
> >      sPAPROptionVector *ov5_cas;     /* negotiated (via CAS) option vectors 
> > */
> >      bool cas_reboot;
> >      bool cas_legacy_guest_workaround;
> > +    bool *pre_2_10_ignore_icp;
> >  
> >      Notifier epow_notifier;
> >      QTAILQ_HEAD(, sPAPREventLogEntry) pending_events;
> >   
> 

Attachment: pgpYyekWMk5Dn.pgp
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]