qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v7 6/6] migration: Block migration while handlin


From: Aravinda Prasad
Subject: Re: [Qemu-devel] [PATCH v7 6/6] migration: Block migration while handling machine check
Date: Tue, 26 Mar 2019 12:22:50 +0530
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0


On Tuesday 26 March 2019 09:49 AM, David Gibson wrote:
> On Mon, Mar 25, 2019 at 02:31:10PM +0530, Aravinda Prasad wrote:
>>
>>
>> On Monday 25 March 2019 12:03 PM, David Gibson wrote:
>>> On Fri, Mar 22, 2019 at 12:04:25PM +0530, Aravinda Prasad wrote:
>>>> Block VM migration requests until the machine check
>>>> error handling is complete as (i) these errors are
>>>> specific to the source hardware and is irrelevant on
>>>> the target hardware, (ii) these errors cause data
>>>> corruption and should be handled before migration.
>>>>
>>>> Signed-off-by: Aravinda Prasad <address@hidden>
>>>> ---
>>>>  hw/ppc/spapr_events.c  |   17 +++++++++++++++++
>>>>  hw/ppc/spapr_rtas.c    |    4 ++++
>>>>  include/hw/ppc/spapr.h |    3 +++
>>>>  3 files changed, 24 insertions(+)
>>>>
>>>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>>>> index d7cc0a4..6356a38 100644
>>>> --- a/hw/ppc/spapr_events.c
>>>> +++ b/hw/ppc/spapr_events.c
>>>> @@ -41,6 +41,7 @@
>>>>  #include "qemu/bcd.h"
>>>>  #include "hw/ppc/spapr_ovec.h"
>>>>  #include <libfdt.h>
>>>> +#include "migration/blocker.h"
>>>>  
>>>>  #define RTAS_LOG_VERSION_MASK                   0xff000000
>>>>  #define   RTAS_LOG_VERSION_6                    0x06000000
>>>> @@ -866,6 +867,22 @@ static void spapr_mce_build_elog(PowerPCCPU *cpu, 
>>>> bool recovered)
>>>>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>>>>  {
>>>>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>>> +    int ret;
>>>> +    Error *local_err = NULL;
>>>> +
>>>> +    error_setg(&spapr->migration_blocker,
>>>> +            "Live migration not supported during machine check handling");
>>>> +    ret = migrate_add_blocker(spapr->migration_blocker, &local_err);
>>>> +    if (ret < 0) {
>>>> +        /*
>>>> +         * We don't want to abort and let the migration to continue. In a
>>>> +         * rare case, the machine check handler will run on the target
>>>> +         * hardware. Though this is not preferable, it is better than 
>>>> aborting
>>>> +         * the migration or killing the VM.
>>>
>>> Can't you just discard the error in that case?
>>
>> I am actually discarding the error. Do you mean I don't have to print
>> the below warning or am I missing something?
> 
> Um.. I guess we should keep the warning.  The tricky thing here is
> that the information on the machine check is no longer relevant after
> the migration, however presumably the event which triggered it might
> have already corrupted guest memory, which *would* be relevant to the
> guest after migration.  Not sure what to do about that.

I think the information is still relevant after the migration, because
it includes the effective address in error and the context. If the
corruption is in the guest kernel context then the machine check handler
inside the guest kernel will call panic(). If the corruption is in user
space, then the application is sent a SIGBUS.

But the problem is when the machine check handler hardware poisons the
page frame backing the corrupt memory to avoid reuse of that page frame.
After migration, the page frame backing the corrupt memory is clean, but
we poison the clean page frame.

> 

-- 
Regards,
Aravinda




reply via email to

[Prev in Thread] Current Thread [Next in Thread]