[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 1/3] migration: Release return path early for paused postcopy
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [PATCH 1/3] migration: Release return path early for paused postcopy |
Date: |
Mon, 12 Jul 2021 18:44:42 +0100 |
User-agent: |
Mutt/2.0.7 (2021-05-04) |
* Peter Xu (peterx@redhat.com) wrote:
> When postcopy pause triggered, we rely on the migration thread to cleanup the
> to_dst_file handle, and the return path thread to cleanup the from_dst_file
> handle (which is stored in the local variable "rp").
>
> Within the process, from_dst_file cleanup (qemu_fclose) is postponed until
> it's
> setup again due to a postcopy recovery.
>
> It used to work before yank was born; after yank is introduced we rely on the
> refcount of IOC to correctly unregister yank function in channel_close(). If
> without the early and on-time release of from_dst_file handle the yank
> function
> will be leftover during paused postcopy.
>
> Without this patch, below steps (quoted from Xiaohui) could trigger qemu src
> crash:
>
> 1.Boot vm on src host
> 2.Boot vm on dst host
> 3.Enable postcopy on src&dst host
> 4.Load stressapptest in vm and set postcopy speed to 50M
> 5.Start migration from src to dst host, change into postcopy mode when
> migration is active.
> 6.When postcopy is active, down the network card(do migration via this
> network) on dst host.
> 7.Wait untill postcopy is paused on src&dst host.
> 8.Before up network card, recover migration on dst host, will get error
> like following.
> 9.Ignore the error of step 8, go on recovering migration on src host:
>
> After step 9, qemu on src host will core dump after some seconds:
> qemu-kvm: ../util/yank.c:107: yank_unregister_instance: Assertion
> `QLIST_EMPTY(&entry->yankfns)' failed.
> 1.sh: line 38: 44662 Aborted (core dumped)
>
> Reported-by: Li Xiaohui <xiaohuixiaohli@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(and I can cleanup the email address problem)
> ---
> migration/migration.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 5ff7ba9d5c..8786104c9a 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2818,12 +2818,12 @@ out:
> * Maybe there is something we can do: it looks like a
> * network down issue, and we pause for a recovery.
> */
> + qemu_fclose(rp);
> + ms->rp_state.from_dst_file = NULL;
> + rp = NULL;
> if (postcopy_pause_return_path_thread(ms)) {
> /* Reload rp, reset the rest */
> - if (rp != ms->rp_state.from_dst_file) {
> - qemu_fclose(rp);
> - rp = ms->rp_state.from_dst_file;
> - }
> + rp = ms->rp_state.from_dst_file;
> ms->rp_state.error = false;
> goto retry;
> }
> --
> 2.31.1
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK