qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: spapr_events: Sure we may ignore migrate_add_blocker() failure?


From: David Gibson
Subject: Re: spapr_events: Sure we may ignore migrate_add_blocker() failure?
Date: Mon, 19 Jul 2021 17:20:58 +1000

On Mon, Jul 19, 2021 at 09:18:07AM +0200, Markus Armbruster wrote:
> David Gibson <david@gibson.dropbear.id.au> writes:
> 
> > On Thu, Jul 15, 2021 at 03:32:06PM +0200, Markus Armbruster wrote:
> >> Commit 2500fb423a "migration: Include migration support for machine
> >> check handling" adds this:
> >> 
> >>     ret = migrate_add_blocker(spapr->fwnmi_migration_blocker, &local_err);
> >>     if (ret == -EBUSY) {
> >>         /*
> >>          * We don't want to abort so we let the migration to continue.
> >>          * In a rare case, the machine check handler will run on the 
> >> target.
> >>          * Though this is not preferable, it is better than aborting
> >>          * the migration or killing the VM.
> >>          */
> >>         warn_report("Received a fwnmi while migration was in progress");
> >>     }
> >> 
> >> migrate_add_blocker() can fail in two ways:
> >> 
> >> 1. -EBUSY: migration is already in progress
> >> 
> >>    Ignoring this one is clearly intentional.  The comment explains why.
> >>    I'm taking it at face value (I'm a spapr ignoramus).
> >
> > Right.  The argument isn't really about papr particularly, except
> > insofar as understanding what fwnmi is.  fwnmi (FirmWare assisted NMI)
> > is a reporting mechanism for certain low-level hardware failures
> > (think memory ECC or cpu level faults, IIRC).  If we migrate between
> > detecting and reporting the error, then the particulars we report will
> > be mostly meaningless since they relate to hardware we're no longer
> > running on.  Hence the migration blocker.
> >
> > However, migrating away from a (non-fatal) fwnmi error is a pretty
> > reasonable response, so we don't want to actually fail a migration if
> > its already in progress.
> >
> >>    Aside: I doubt
> >>    the warning is going to help users.
> >
> > You're probably right, but it's not very clear how to do better.  It
> > might possibly help someone in tech support explain why the reported
> > fwnmi doesn't seem to match the hardware the guest is (now) running
> > on.
> 
> Perhaps pointing to the actual problem could help: the FWNMI's
> information is mostly meaningless.

Sorry, I don't follow what you're suggesting.

> 
> >> 2. -EACCES: we're running with -only-migratable
> >> 
> >>    Why may we ignore -only-migratable here?
> >
> > Short answer: because I didn't think about that case.  Long answer:
> > I think we probably shoud ignore it anyway.  As above, receiving a
> > fwnmi doesn't really prevent migration, it just means that if you're
> > unlucky it can report stale information.  Since migrating away from a
> > possibly-dubious host would be a reasonable response to a non-fatal
> > fwnmi, I don't think we want to simply prohibit fwnmi entirely with
> > -only-migratable.
> 
> I think the comment text and placement could be improved to make clear
> ignoring this failure is intentional, too.  How do you like the
> following?

That's fair..

> 
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index a8f2cc6bdc..54d8e856d3 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -911,16 +911,14 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool 
> recovered)
>          }
>      }
>  
> +    /*
> +     * Try to block migration while FWNMI is being handled, so the
> +     * machine check handler runs where the information passed to it
> +     * actually makes sense.  This won't actually block migration,
> +     * only delay it slightly.  If the attempt fails, carry on.
> +     */
>      ret = migrate_add_blocker(spapr->fwnmi_migration_blocker, NULL);
>      if (ret == -EBUSY) {
> -        /*
> -         * We don't want to abort so we let the migration to continue.
> -         * In a rare case, the machine check handler will run on the target.
> -         * Though this is not preferable, it is better than aborting
> -         * the migration or killing the VM. It is okay to call
> -         * migrate_del_blocker on a blocker that was not added (which the
> -         * nmi-interlock handler would do when it's called after this).
> -         */
>          warn_report("Received a fwnmi while migration was in progress");
>      }

LGTM.

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]