qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH] migration: Handle block device inactivation failures bet


From: Eric Blake
Subject: Re: [RFC PATCH] migration: Handle block device inactivation failures better
Date: Fri, 14 Apr 2023 08:26:51 -0500
User-agent: NeoMutt/20230407

On Fri, Apr 14, 2023 at 02:15:45PM +0200, Juan Quintela wrote:
> Eric Blake <eblake@redhat.com> wrote:
> > Consider what happens when performing a migration between two host
> > machines connected to an NFS server serving multiple block devices to
> > the guest, when the NFS server becomes unavailable.  The migration
> > attempts to inactivate all block devices on the source (a necessary
> > step before the destination can take over); but if the NFS server is
> > non-responsive, the attempt to inactivate can itself fail.  When that
> > happens, the destination fails to get the migrated guest (good,
> > because the source wasn't able to flush everything properly):
> >
> >   (qemu) qemu-kvm: load of migration failed: Input/output error
> >
> > at which point, our only hope for the guest is for the source to take
> > back control.  With the current code base, the host outputs a message, but 
> > then appears to resume:
> >
> >   (qemu) qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable: 
> > bdrv_inactivate_all() failed (-1)
> >
> >   (src qemu)info status
> >    VM status: running
> >
> > but a second migration attempt now asserts:
> >
> >   (src qemu) qemu-kvm: ../block.c:6738: int 
> > bdrv_inactivate_recurse(BlockDriverState *): Assertion `!(bs->open_flags & 
> > BDRV_O_INACTIVE)' failed.
> >
> > Whether the guest is recoverable on the source after the first failure
> > is debatable, but what we do not want is to have qemu itself fail due
> > to an assertion.  It looks like the problem is as follows:
> >
> > In migration.c:migration_completion(), the source sets 'inactivate' to
> > true (since COLO is not enabled), then tries
> > savevm.c:qemu_savevm_state_complete_precopy() with a request to
> > inactivate block devices.  In turn, this calls
> > block.c:bdrv_inactivate_all(), which fails when flushing runs up
> > against the non-responsive NFS server.  With savevm failing, we are
> > now left in a state where some, but not all, of the block devices have
> > been inactivated; the 'fail_invalidate' label of
> > migration_completion() then wants to reclaim those disks by calling
> > bdrv_activate_all() - but this too can fail, yet nothing takes note of
> > that failure.
> >
> > Thus, we have reached a state where the migration engine has forgotten
> > all state about whether a block device is inactive, because we did not
> > set s->block_inactive; so migration allows the source to reach
> > vm_start() and resume execution, violating the block layer invariant
> > that the guest CPUs should not be restarted while a device is
> > inactive.  Note that the code in migration.c:migrate_fd_cancel() will
> > also try to reactivate all block devices if s->block_inactive was set,
> > but because we failed to set that flag after the first failure, the
> > source assumes it has reclaimed all devices, even though it still has
> > remaining inactivated devices and does not try again.  Normally,
> > qmp_cont() will also try to reactivate all disks (or correctly fail if
> > the disks are not reclaimable because NFS is not yet back up), but the
> > auto-resumption of the source after a migration failure does not go
> > through qmp_cont().  And because we have left the block layer in an
> > inconsistent state with devices still inactivated, the later migration
> > attempt is hitting the assertion failure.
> >
> > Since it is important to not resume the source with inactive disks,
> > this patch tries harder at tracking whether migration attempted to
> > inactivate any devices, in order to prevent any vm_start() until it
> > has successfully reactivated all devices.
> >
> > See also https://bugzilla.redhat.com/show_bug.cgi?id=2058982
> >
> > Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> Wow
> 
> Such a big comment for such a small patch.
> And then people think that there is not "nuance" anymore :-)

It took me a while to figure out the control paths - it is not obvious
that not all migration paths attempt to deactivate images, and
therefore not all paths need to try to reactivate them.  Yes, this
commit message is proof that our code base is complex!

> > @@ -3518,6 +3519,7 @@ fail_invalidate:
> >          bdrv_activate_all(&local_err);
> >          if (local_err) {
> >              error_report_err(local_err);
> > +            s->block_inactive = inactivate;
> 
> Why not just put here:
> 
> s->block_inactive = true;
> 
> And call it a day?

Interesting idea.  Note that it is possible to reach the
fail_invalidate label WITHOUT having first tried to invalidate disk
images (for example, if s->state != MIGRATION_STATUS_ACTIVE on entry,
but qemu_file_get_error() fails) - but then again, in the
fail_invalidate label, we only attempt to reactivate if s->state is
MIGRATION_STATUS_ACTIVE or MIGRATION_STATUS_DEVICE (the latter
possible right before we attempt to inactivate devices above).

> 
> if bdrv_activate_all() fails, we can't continue anyways.

Good point.  And I think it points out yet another problem: if
qemu_savevm_state_complete_precopy() fails, we 'goto fail' instead of
'goto fail_invalidate', but without potentially marking
s->block_inactive.

> 
> Or I am missing something?

I'll post a v2 along those lines.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




reply via email to

[Prev in Thread] Current Thread [Next in Thread]