qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] [RFC drcVI PATCH] spapr: reset DRCs on migration pre_load


From: David Gibson
Subject: Re: [Qemu-ppc] [RFC drcVI PATCH] spapr: reset DRCs on migration pre_load
Date: Mon, 10 Jul 2017 16:39:10 +1000
User-agent: Mutt/1.8.3 (2017-05-23)

On Fri, Jul 07, 2017 at 06:20:37PM -0300, Daniel Henrique Barboza wrote:
> "spapr: Remove 'awaiting_allocation' DRC flag" removed the flag that
> was originally was being used to prevent a race condition between
> hot unplug and hotplug. The DRC code base got simplified and more
> robust over time, eliminating the conditions that led to this race.
> Thus the awaiting_allocation existence wasn't justifiable anymore.
> 
> A side effect of the flag removal was seen when testing the Libvirt
> hotplug-migration-unplug scenario, where a device is hotplugged in both
> source and target using device_add prior to the migration, then the
> device is removed after migration in the target. Before that cleanup, the
> hot unplug at the target fails in both QEMU and guest kernel because
> the DRC state at the target is inconsistent. After removing that flag,
> the hot unplug works at QEMU but the guest kernel hungs on the middle
> of the unplug process.
> 
> It turns out that the awaiting_allocation logic was preventing the hot
> unplug from happening at the target because the DRC state, at this specific
> hot unplug scenario, was matching the race condition the flag was
> originally designed to avoid. Removing the flag allowed the device
> to be removed from QEMU, leading to this new behavior.
> 
> The root cause of those problems is, in fact, the inconsistent state of the
> target DRCs after migration is completed. Doing device_add in the
> INMIGRATE status leaves the DRC in a state that isn't recognized as a
> valid hotplugged device in the guest OS.
> 
> This patch fixes the problem by using the recently modified 'drc_reset'
> function, that now forces the DRC to a known state by checking its device
> status, to reset all DRCs in the pre_load hook of the migration. Resetting
> the DRCs in pre_load allows the DRCs to be in a predictable state when
> we load the migration at the target, allowing for hot unplugs to work
> as expected.
> 
> Signed-off-by: Daniel Henrique Barboza <address@hidden>

Ok, so the fact this works is pretty promising.  However, I'm still
trying to fully understand what's going on here.  I have a suspicion
that this is only necessary because something isn't quite right with
the reset / inmigrate sequencing in the generic code, which we should
fix instead of hacking around.

IIUC, in the problem case, on the source the hotplug has fully
completed, so the DRC will be in CONFIGURED state.  Since the device
is CONFIGURED and attached, no DRC info is sent in the migration
stream.  On the destination what seems to be happening is:

1. qemu is started with "-incoming defer", and cpu *not* present

    DRC is uninitialized

2. qemu_system_reset() is called in vl.c

    DRC is in UNALLOCATED / detached state

3. libvirt device_adds the cpu

    DRC is in UNALLOCATED / attached state

4. libvirt initiates incoming migration

    DRC remains in UNALLOCATED / attached state

5. Guest resumes on the destination

    DRC still in UNALLOCATED / attached state

Which mismatches what we had on the source so => bug.

BUT, AFAIK the libvirt coldplug case below *is* working.  Which
tracing through the code I'd expect:

1. qemu is started with -S and cpu not present

   DRC is uninitialized

2. qemu_system_reset() is called in vl.c

   DRC is in UNALLOCATED / detached state

3. libvirt device_adds in prelaunch phase

   DRC is in UNALLOCATED / attached state

4. Guest is started

   DRC is in UNALLOCATED / attached state

Which is also incorrect: the device was present when the guest
started, so it should be in CONFIGURED state.  IIUC this case is
working, so I think it is must actually be in CONFIGURED state.

So, I'm trying to understand why we get different results in these two
cases.

[Aside:

If we do need to manually trigger a reset, code like the below is
basically what we need.  Just putting a reset in each DRC's pre_load
looks neater, but won't work - the pre_load won't be called unless the
object actually appears in the migration stream, and IIUC the case
which isn't working is exactly the case where the object is omitted
from the migration stream.]

> ---
>  hw/ppc/spapr.c             |  7 +++++++
>  hw/ppc/spapr_drc.c         | 17 +++++++++++++++++
>  include/hw/ppc/spapr_drc.h |  1 +
>  3 files changed, 25 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 089d41d..aea85b0 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1473,6 +1473,12 @@ static bool spapr_vga_init(PCIBus *pci_bus, Error 
> **errp)
>      }
>  }
>  
> +static int spapr_pre_load(void *opaque)
> +{
> +    spapr_reset_all_drcs();
> +    return 0;
> +}
> +
>  static int spapr_post_load(void *opaque, int version_id)
>  {
>      sPAPRMachineState *spapr = (sPAPRMachineState *)opaque;
> @@ -1598,6 +1604,7 @@ static const VMStateDescription vmstate_spapr = {
>      .name = "spapr",
>      .version_id = 3,
>      .minimum_version_id = 1,
> +    .pre_load = spapr_pre_load,
>      .post_load = spapr_post_load,
>      .fields = (VMStateField[]) {
>          /* used to be @next_irq */
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 63637d8..74f3957 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -449,6 +449,23 @@ static void drc_reset(void *opaque)
>      drc->ccs_depth = -1;
>  }
>  
> +void spapr_reset_all_drcs(void)
> +{
> +    Object *drc_container, *obj;
> +    ObjectProperty *prop;
> +    ObjectPropertyIterator iter;
> +
> +    drc_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
> +    object_property_iter_init(&iter, drc_container);
> +    while ((prop = object_property_iter_next(&iter))) {
> +        if (!strstart(prop->type, "link<", NULL)) {
> +            continue;
> +        }
> +        obj = object_property_get_link(drc_container, prop->name, NULL);
> +        drc_reset(SPAPR_DR_CONNECTOR(obj));
> +    }
> +}
> +
>  static bool spapr_drc_needed(void *opaque)
>  {
>      sPAPRDRConnector *drc = (sPAPRDRConnector *)opaque;
> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> index 4c54864..c7553e6 100644
> --- a/include/hw/ppc/spapr_drc.h
> +++ b/include/hw/ppc/spapr_drc.h
> @@ -250,4 +250,5 @@ static inline bool 
> spapr_drc_unplug_requested(sPAPRDRConnector *drc)
>      return drc->unplug_requested;
>  }
>  
> +void spapr_reset_all_drcs(void);
>  #endif /* HW_SPAPR_DRC_H */

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]