qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] [PATCH v7 6/6] migration: Block migration while handling


From: David Gibson
Subject: Re: [Qemu-ppc] [PATCH v7 6/6] migration: Block migration while handling machine check
Date: Thu, 28 Mar 2019 20:56:26 +1100
User-agent: Mutt/1.11.3 (2019-02-01)

On Thu, Mar 28, 2019 at 12:09:05PM +0530, Aravinda Prasad wrote:
> 
> 
> On Thursday 28 March 2019 06:18 AM, David Gibson wrote:
> > On Tue, Mar 26, 2019 at 12:22:50PM +0530, Aravinda Prasad wrote:
> >>
> >>
> >> On Tuesday 26 March 2019 09:49 AM, David Gibson wrote:
> >>> On Mon, Mar 25, 2019 at 02:31:10PM +0530, Aravinda Prasad wrote:
> >>>>
> >>>>
> >>>> On Monday 25 March 2019 12:03 PM, David Gibson wrote:
> >>>>> On Fri, Mar 22, 2019 at 12:04:25PM +0530, Aravinda Prasad wrote:
> >>>>>> Block VM migration requests until the machine check
> >>>>>> error handling is complete as (i) these errors are
> >>>>>> specific to the source hardware and is irrelevant on
> >>>>>> the target hardware, (ii) these errors cause data
> >>>>>> corruption and should be handled before migration.
> >>>>>>
> >>>>>> Signed-off-by: Aravinda Prasad <address@hidden>
> >>>>>> ---
> >>>>>>  hw/ppc/spapr_events.c  |   17 +++++++++++++++++
> >>>>>>  hw/ppc/spapr_rtas.c    |    4 ++++
> >>>>>>  include/hw/ppc/spapr.h |    3 +++
> >>>>>>  3 files changed, 24 insertions(+)
> >>>>>>
> >>>>>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> >>>>>> index d7cc0a4..6356a38 100644
> >>>>>> --- a/hw/ppc/spapr_events.c
> >>>>>> +++ b/hw/ppc/spapr_events.c
> >>>>>> @@ -41,6 +41,7 @@
> >>>>>>  #include "qemu/bcd.h"
> >>>>>>  #include "hw/ppc/spapr_ovec.h"
> >>>>>>  #include <libfdt.h>
> >>>>>> +#include "migration/blocker.h"
> >>>>>>  
> >>>>>>  #define RTAS_LOG_VERSION_MASK                   0xff000000
> >>>>>>  #define   RTAS_LOG_VERSION_6                    0x06000000
> >>>>>> @@ -866,6 +867,22 @@ static void spapr_mce_build_elog(PowerPCCPU *cpu, 
> >>>>>> bool recovered)
> >>>>>>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> >>>>>>  {
> >>>>>>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> >>>>>> +    int ret;
> >>>>>> +    Error *local_err = NULL;
> >>>>>> +
> >>>>>> +    error_setg(&spapr->migration_blocker,
> >>>>>> +            "Live migration not supported during machine check 
> >>>>>> handling");
> >>>>>> +    ret = migrate_add_blocker(spapr->migration_blocker, &local_err);
> >>>>>> +    if (ret < 0) {
> >>>>>> +        /*
> >>>>>> +         * We don't want to abort and let the migration to continue. 
> >>>>>> In a
> >>>>>> +         * rare case, the machine check handler will run on the target
> >>>>>> +         * hardware. Though this is not preferable, it is better than 
> >>>>>> aborting
> >>>>>> +         * the migration or killing the VM.
> >>>>>
> >>>>> Can't you just discard the error in that case?
> >>>>
> >>>> I am actually discarding the error. Do you mean I don't have to print
> >>>> the below warning or am I missing something?
> >>>
> >>> Um.. I guess we should keep the warning.  The tricky thing here is
> >>> that the information on the machine check is no longer relevant after
> >>> the migration, however presumably the event which triggered it might
> >>> have already corrupted guest memory, which *would* be relevant to the
> >>> guest after migration.  Not sure what to do about that.
> >>
> >> I think the information is still relevant after the migration, because
> >> it includes the effective address in error and the context. If the
> >> corruption is in the guest kernel context then the machine check handler
> >> inside the guest kernel will call panic(). If the corruption is in user
> >> space, then the application is sent a SIGBUS.
> > 
> > Good point.
> > 
> >> But the problem is when the machine check handler hardware poisons the
> >> page frame backing the corrupt memory to avoid reuse of that page frame.
> >> After migration, the page frame backing the corrupt memory is clean, but
> >> we poison the clean page frame.
> > 
> > This is the guest side handler?  It sounds like we we need two
> > variants of the notification that say whether or not the area is still
> > bad.
> 
> Yes this is guest side handler.
> 
> Can we just let the guest machine check handler poison a clean page as a
> migration during a machine check is a rare occurrence. We may at most
> lose a clean page which is a trade-off for additional complexity in the
> machine check handler.

Yeah, that seems reasonable.

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]