qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v7 6/6] migration: Block migration while handlin


From: David Gibson
Subject: Re: [Qemu-devel] [PATCH v7 6/6] migration: Block migration while handling machine check
Date: Thu, 28 Mar 2019 11:48:28 +1100
User-agent: Mutt/1.11.3 (2019-02-01)

On Tue, Mar 26, 2019 at 12:22:50PM +0530, Aravinda Prasad wrote:
> 
> 
> On Tuesday 26 March 2019 09:49 AM, David Gibson wrote:
> > On Mon, Mar 25, 2019 at 02:31:10PM +0530, Aravinda Prasad wrote:
> >>
> >>
> >> On Monday 25 March 2019 12:03 PM, David Gibson wrote:
> >>> On Fri, Mar 22, 2019 at 12:04:25PM +0530, Aravinda Prasad wrote:
> >>>> Block VM migration requests until the machine check
> >>>> error handling is complete as (i) these errors are
> >>>> specific to the source hardware and is irrelevant on
> >>>> the target hardware, (ii) these errors cause data
> >>>> corruption and should be handled before migration.
> >>>>
> >>>> Signed-off-by: Aravinda Prasad <address@hidden>
> >>>> ---
> >>>>  hw/ppc/spapr_events.c  |   17 +++++++++++++++++
> >>>>  hw/ppc/spapr_rtas.c    |    4 ++++
> >>>>  include/hw/ppc/spapr.h |    3 +++
> >>>>  3 files changed, 24 insertions(+)
> >>>>
> >>>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> >>>> index d7cc0a4..6356a38 100644
> >>>> --- a/hw/ppc/spapr_events.c
> >>>> +++ b/hw/ppc/spapr_events.c
> >>>> @@ -41,6 +41,7 @@
> >>>>  #include "qemu/bcd.h"
> >>>>  #include "hw/ppc/spapr_ovec.h"
> >>>>  #include <libfdt.h>
> >>>> +#include "migration/blocker.h"
> >>>>  
> >>>>  #define RTAS_LOG_VERSION_MASK                   0xff000000
> >>>>  #define   RTAS_LOG_VERSION_6                    0x06000000
> >>>> @@ -866,6 +867,22 @@ static void spapr_mce_build_elog(PowerPCCPU *cpu, 
> >>>> bool recovered)
> >>>>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> >>>>  {
> >>>>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> >>>> +    int ret;
> >>>> +    Error *local_err = NULL;
> >>>> +
> >>>> +    error_setg(&spapr->migration_blocker,
> >>>> +            "Live migration not supported during machine check 
> >>>> handling");
> >>>> +    ret = migrate_add_blocker(spapr->migration_blocker, &local_err);
> >>>> +    if (ret < 0) {
> >>>> +        /*
> >>>> +         * We don't want to abort and let the migration to continue. In 
> >>>> a
> >>>> +         * rare case, the machine check handler will run on the target
> >>>> +         * hardware. Though this is not preferable, it is better than 
> >>>> aborting
> >>>> +         * the migration or killing the VM.
> >>>
> >>> Can't you just discard the error in that case?
> >>
> >> I am actually discarding the error. Do you mean I don't have to print
> >> the below warning or am I missing something?
> > 
> > Um.. I guess we should keep the warning.  The tricky thing here is
> > that the information on the machine check is no longer relevant after
> > the migration, however presumably the event which triggered it might
> > have already corrupted guest memory, which *would* be relevant to the
> > guest after migration.  Not sure what to do about that.
> 
> I think the information is still relevant after the migration, because
> it includes the effective address in error and the context. If the
> corruption is in the guest kernel context then the machine check handler
> inside the guest kernel will call panic(). If the corruption is in user
> space, then the application is sent a SIGBUS.

Good point.

> But the problem is when the machine check handler hardware poisons the
> page frame backing the corrupt memory to avoid reuse of that page frame.
> After migration, the page frame backing the corrupt memory is clean, but
> we poison the clean page frame.

This is the guest side handler?  It sounds like we we need two
variants of the notification that say whether or not the area is still
bad.

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]