[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 1/2] migration: Unify reset of last_rb on destination node wh
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [PATCH 1/2] migration: Unify reset of last_rb on destination node when recover |
Date: |
Mon, 2 Nov 2020 18:23:39 +0000 |
User-agent: |
Mutt/1.14.6 (2020-07-11) |
* Peter Xu (peterx@redhat.com) wrote:
> When postcopy recover happens, we need to reset last_rb after each return of
> postcopy_pause_fault_thread() because that means we just got the postcopy
> migration continued.
>
> Unify this reset to the place right before we want to kick the fault thread
> again, when we get the command MIG_CMD_POSTCOPY_RESUME from source.
>
> This is actually more than that - because the main thread on destination will
> now be able to call migrate_send_rp_req_pages_pending() too, so the fault
> thread is not the only user of last_rb now. Move the reset earlier will allow
> the first call to migrate_send_rp_req_pages_pending() to use the reset value
> even if called from the main thread.
>
> (NOTE: this is not a real fix to 0c26781c09 mentioned below, however it is
> just
> a mark that when picking up 0c26781c09 we'd better have this one too; the
> real
> fix will come later)
>
> Fixes: 0c26781c09 ("migration: Sync requested pages after postcopy recovery")
> Tested-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
> migration/postcopy-ram.c | 2 --
> migration/savevm.c | 6 ++++++
> 2 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index d3bb3a744b..d99842eb1b 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -903,7 +903,6 @@ static void *postcopy_ram_fault_thread(void *opaque)
> * the channel is rebuilt.
> */
> if (postcopy_pause_fault_thread(mis)) {
> - mis->last_rb = NULL;
> /* Continue to read the userfaultfd */
> } else {
> error_report("%s: paused but don't allow to continue",
> @@ -985,7 +984,6 @@ retry:
> /* May be network failure, try to wait for recovery */
> if (ret == -EIO && postcopy_pause_fault_thread(mis)) {
> /* We got reconnected somehow, try to continue */
> - mis->last_rb = NULL;
> goto retry;
> } else {
> /* This is a unavoidable fault */
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 21ccba9fb3..e8834991ec 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2061,6 +2061,12 @@ static int
> loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
> return 0;
> }
>
> + /*
> + * Reset the last_rb before we resend any page req to source again, since
> + * the source should have it reset already.
> + */
> + mis->last_rb = NULL;
> +
> /*
> * This means source VM is ready to resume the postcopy migration.
> * It's time to switch state and release the fault thread to
> --
> 2.26.2
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK