[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: spapr_events: Sure we may ignore migrate_add_blocker() failure?
From: |
Markus Armbruster |
Subject: |
Re: spapr_events: Sure we may ignore migrate_add_blocker() failure? |
Date: |
Mon, 19 Jul 2021 09:18:07 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) |
David Gibson <david@gibson.dropbear.id.au> writes:
> On Thu, Jul 15, 2021 at 03:32:06PM +0200, Markus Armbruster wrote:
>> Commit 2500fb423a "migration: Include migration support for machine
>> check handling" adds this:
>>
>> ret = migrate_add_blocker(spapr->fwnmi_migration_blocker, &local_err);
>> if (ret == -EBUSY) {
>> /*
>> * We don't want to abort so we let the migration to continue.
>> * In a rare case, the machine check handler will run on the target.
>> * Though this is not preferable, it is better than aborting
>> * the migration or killing the VM.
>> */
>> warn_report("Received a fwnmi while migration was in progress");
>> }
>>
>> migrate_add_blocker() can fail in two ways:
>>
>> 1. -EBUSY: migration is already in progress
>>
>> Ignoring this one is clearly intentional. The comment explains why.
>> I'm taking it at face value (I'm a spapr ignoramus).
>
> Right. The argument isn't really about papr particularly, except
> insofar as understanding what fwnmi is. fwnmi (FirmWare assisted NMI)
> is a reporting mechanism for certain low-level hardware failures
> (think memory ECC or cpu level faults, IIRC). If we migrate between
> detecting and reporting the error, then the particulars we report will
> be mostly meaningless since they relate to hardware we're no longer
> running on. Hence the migration blocker.
>
> However, migrating away from a (non-fatal) fwnmi error is a pretty
> reasonable response, so we don't want to actually fail a migration if
> its already in progress.
>
>> Aside: I doubt
>> the warning is going to help users.
>
> You're probably right, but it's not very clear how to do better. It
> might possibly help someone in tech support explain why the reported
> fwnmi doesn't seem to match the hardware the guest is (now) running
> on.
Perhaps pointing to the actual problem could help: the FWNMI's
information is mostly meaningless.
>> 2. -EACCES: we're running with -only-migratable
>>
>> Why may we ignore -only-migratable here?
>
> Short answer: because I didn't think about that case. Long answer:
> I think we probably shoud ignore it anyway. As above, receiving a
> fwnmi doesn't really prevent migration, it just means that if you're
> unlucky it can report stale information. Since migrating away from a
> possibly-dubious host would be a reasonable response to a non-fatal
> fwnmi, I don't think we want to simply prohibit fwnmi entirely with
> -only-migratable.
I think the comment text and placement could be improved to make clear
ignoring this failure is intentional, too. How do you like the
following?
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index a8f2cc6bdc..54d8e856d3 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -911,16 +911,14 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
}
}
+ /*
+ * Try to block migration while FWNMI is being handled, so the
+ * machine check handler runs where the information passed to it
+ * actually makes sense. This won't actually block migration,
+ * only delay it slightly. If the attempt fails, carry on.
+ */
ret = migrate_add_blocker(spapr->fwnmi_migration_blocker, NULL);
if (ret == -EBUSY) {
- /*
- * We don't want to abort so we let the migration to continue.
- * In a rare case, the machine check handler will run on the target.
- * Though this is not preferable, it is better than aborting
- * the migration or killing the VM. It is okay to call
- * migrate_del_blocker on a blocker that was not added (which the
- * nmi-interlock handler would do when it's called after this).
- */
warn_report("Received a fwnmi while migration was in progress");
}
>> By the way, we leak @local_err on failure. I'll post a patch, but I'd
>> like my question answered first.
- spapr_events: Sure we may ignore migrate_add_blocker() failure?, Markus Armbruster, 2021/07/15
- Re: spapr_events: Sure we may ignore migrate_add_blocker() failure?, David Gibson, 2021/07/18
- Re: spapr_events: Sure we may ignore migrate_add_blocker() failure?,
Markus Armbruster <=
- Re: spapr_events: Sure we may ignore migrate_add_blocker() failure?, David Gibson, 2021/07/19
- Re: spapr_events: Sure we may ignore migrate_add_blocker() failure?, Markus Armbruster, 2021/07/19
- -only-migrate and the two different uses of migration blockers (was: spapr_events: Sure we may ignore migrate_add_blocker() failure?), Markus Armbruster, 2021/07/19
- Re: -only-migrate and the two different uses of migration blockers (was: spapr_events: Sure we may ignore migrate_add_blocker() failure?), Dr. David Alan Gilbert, 2021/07/19
- Re: -only-migrate and the two different uses of migration blockers, Markus Armbruster, 2021/07/20
- Re: -only-migrate and the two different uses of migration blockers, David Gibson, 2021/07/21
- Re: -only-migrate and the two different uses of migration blockers, Dr. David Alan Gilbert, 2021/07/22
- Re: -only-migrate and the two different uses of migration blockers, David Gibson, 2021/07/25
- Re: spapr_events: Sure we may ignore migrate_add_blocker() failure?, David Gibson, 2021/07/21