[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PULL 18/24] migration/savevm: don't worry if bitmap migration postcopy
From: |
Eric Blake |
Subject: |
[PULL 18/24] migration/savevm: don't worry if bitmap migration postcopy failed |
Date: |
Mon, 27 Jul 2020 15:55:37 -0500 |
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
First, if only bitmaps postcopy is enabled (and not ram postcopy)
postcopy_pause_incoming crashes on an assertion
assert(mis->to_src_file).
And anyway, bitmaps postcopy is not prepared to be somehow recovered.
The original idea instead is that if bitmaps postcopy failed, we just
lose some bitmaps, which is not critical. So, on failure we just need
to remove unfinished bitmaps and guest should continue execution on
destination.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200727194236.19551-18-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
---
migration/savevm.c | 37 ++++++++++++++++++++++++++++++++-----
1 file changed, 32 insertions(+), 5 deletions(-)
diff --git a/migration/savevm.c b/migration/savevm.c
index 45c9dd9d8a6d..a843d202b5b4 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1813,6 +1813,9 @@ static void *postcopy_ram_listen_thread(void *opaque)
MigrationIncomingState *mis = migration_incoming_get_current();
QEMUFile *f = mis->from_src_file;
int load_res;
+ MigrationState *migr = migrate_get_current();
+
+ object_ref(OBJECT(migr));
migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
MIGRATION_STATUS_POSTCOPY_ACTIVE);
@@ -1839,11 +1842,24 @@ static void *postcopy_ram_listen_thread(void *opaque)
trace_postcopy_ram_listen_thread_exit();
if (load_res < 0) {
- error_report("%s: loadvm failed: %d", __func__, load_res);
qemu_file_set_error(f, load_res);
- migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
- MIGRATION_STATUS_FAILED);
- } else {
+ dirty_bitmap_mig_cancel_incoming();
+ if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
+ !migrate_postcopy_ram() && migrate_dirty_bitmaps())
+ {
+ error_report("%s: loadvm failed during postcopy: %d. All states "
+ "are migrated except dirty bitmaps. Some dirty "
+ "bitmaps may be lost, and present migrated dirty "
+ "bitmaps are correctly migrated and valid.",
+ __func__, load_res);
+ load_res = 0; /* prevent further exit() */
+ } else {
+ error_report("%s: loadvm failed: %d", __func__, load_res);
+ migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+ MIGRATION_STATUS_FAILED);
+ }
+ }
+ if (load_res >= 0) {
/*
* This looks good, but it's possible that the device loading in the
* main thread hasn't finished yet, and so we might not be in 'RUN'
@@ -1879,6 +1895,8 @@ static void *postcopy_ram_listen_thread(void *opaque)
mis->have_listen_thread = false;
postcopy_state_set(POSTCOPY_INCOMING_END);
+ object_unref(OBJECT(migr));
+
return NULL;
}
@@ -2437,6 +2455,8 @@ static bool
postcopy_pause_incoming(MigrationIncomingState *mis)
{
trace_postcopy_pause_incoming();
+ assert(migrate_postcopy_ram());
+
/* Clear the triggered bit to allow one recovery */
mis->postcopy_recover_triggered = false;
@@ -2521,15 +2541,22 @@ out:
if (ret < 0) {
qemu_file_set_error(f, ret);
+ /* Cancel bitmaps incoming regardless of recovery */
+ dirty_bitmap_mig_cancel_incoming();
+
/*
* If we are during an active postcopy, then we pause instead
* of bail out to at least keep the VM's dirty data. Note
* that POSTCOPY_INCOMING_LISTENING stage is still not enough,
* during which we're still receiving device states and we
* still haven't yet started the VM on destination.
+ *
+ * Only RAM postcopy supports recovery. Still, if RAM postcopy is
+ * enabled, canceled bitmaps postcopy will not affect RAM postcopy
+ * recovering.
*/
if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
- postcopy_pause_incoming(mis)) {
+ migrate_postcopy_ram() && postcopy_pause_incoming(mis)) {
/* Reset f to point to the newly created channel */
f = mis->from_src_file;
goto retry;
--
2.27.0
- [PULL 08/24] migration/block-dirty-bitmap: fix dirty_bitmap_mig_before_vm_start, (continued)
- [PULL 08/24] migration/block-dirty-bitmap: fix dirty_bitmap_mig_before_vm_start, Eric Blake, 2020/07/27
- [PULL 09/24] migration/block-dirty-bitmap: rename state structure types, Eric Blake, 2020/07/27
- [PULL 11/24] migration/block-dirty-bitmap: move mutex init to dirty_bitmap_mig_init, Eric Blake, 2020/07/27
- [PULL 12/24] migration/block-dirty-bitmap: refactor state global variables, Eric Blake, 2020/07/27
- [PULL 10/24] migration/block-dirty-bitmap: rename dirty_bitmap_mig_cleanup, Eric Blake, 2020/07/27
- [PULL 15/24] migration/block-dirty-bitmap: keep bitmap state for all bitmaps, Eric Blake, 2020/07/27
- [PULL 14/24] migration/block-dirty-bitmap: simplify dirty_bitmap_load_complete, Eric Blake, 2020/07/27
- [PULL 13/24] migration/block-dirty-bitmap: rename finish_lock to just lock, Eric Blake, 2020/07/27
- [PULL 16/24] migration/block-dirty-bitmap: relax error handling in incoming part, Eric Blake, 2020/07/27
- [PULL 17/24] migration/block-dirty-bitmap: cancel migration on shutdown, Eric Blake, 2020/07/27
- [PULL 18/24] migration/savevm: don't worry if bitmap migration postcopy failed,
Eric Blake <=
- [PULL 20/24] qemu-iotests/199: check persistent bitmaps, Eric Blake, 2020/07/27
- [PULL 22/24] qemu-iotests/199: add source-killed case to bitmaps postcopy, Eric Blake, 2020/07/27
- [PULL 19/24] qemu-iotests/199: prepare for new test-cases addition, Eric Blake, 2020/07/27
- [PULL 21/24] qemu-iotests/199: add early shutdown case to bitmaps postcopy, Eric Blake, 2020/07/27
- [PULL 23/24] iotests: Adjust which migration tests are quick, Eric Blake, 2020/07/27
- [PULL 24/24] migration: Fix typos in bitmap migration comments, Eric Blake, 2020/07/27
- Re: [PULL 00/24] bitmaps patches for -rc2, 2020-07-27, Peter Maydell, 2020/07/28