qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-8.0 2/3] migration: Fix potential race on postcopy_qemufi


From: Juan Quintela
Subject: Re: [PATCH for-8.0 2/3] migration: Fix potential race on postcopy_qemufile_src
Date: Wed, 12 Apr 2023 21:18:34 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)

Peter Xu <peterx@redhat.com> wrote:
> postcopy_qemufile_src object should be owned by one thread, either the main
> thread (e.g. when at the beginning, or at the end of migration), or by the
> return path thread (when during a preempt enabled postcopy migration).  If
> that's not the case the access to the object might be racy.
>
> postcopy_preempt_shutdown_file() can be potentially racy, because it's
> called at the end phase of migration on the main thread, however during
> which the return path thread hasn't yet been recycled; the recycle happens
> in await_return_path_close_on_source() which is after this point.
>
> It means, logically it's posslbe the main thread and the return path thread
> are both operating on the same qemufile.  While I don't think qemufile is
> thread safe at all.
>
> postcopy_preempt_shutdown_file() used to be needed because that's where we
> send EOS to dest so that dest can safely shutdown the preempt thread.
>
> To avoid the possible race, remove this only place that a race can happen.
> Instead we figure out another way to safely close the preempt thread on
> dest.
>
> The core idea during postcopy on deciding "when to stop" is that dest will
> send a postcopy SHUT message to src, telling src that all data is there.
> Hence to shut the dest preempt thread maybe better to do it directly on
> dest node.
>
> This patch proposed such a way that we change postcopy_prio_thread_created
> into PreemptThreadStatus, so that we kick the preempt thread on dest qemu
> by a sequence of:
>
>   mis->preempt_thread_status = PREEMPT_THREAD_QUIT;
>   qemu_file_shutdown(mis->postcopy_qemufile_dst);
>
> While here shutdown() is probably so far the easiest way to kick preempt
> thread from a blocked qemu_get_be64().  Then it reads preempt_thread_status
> to make sure it's not a network failure but a willingness to quit the
> thread.
>
> We could have avoided that extra status but just rely on migration status.
> The problem is postcopy_ram_incoming_cleanup() is just called early enough
> so we're still during POSTCOPY_ACTIVE no matter what.. So just make it
> simple to have the status introduced.
>
> One flag x-preempt-pre-7-2 is added to keep old pre-7.2 behaviors of
> postcopy preempt.
>
> Fixes: 9358982744 ("migration: Send requested page directly in rp-return 
> thread")
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/core/machine.c        |  1 +
>  migration/migration.c    | 10 ++++++++--
>  migration/migration.h    | 34 +++++++++++++++++++++++++++++++++-
>  migration/postcopy-ram.c | 20 +++++++++++++++-----
>  4 files changed, 57 insertions(+), 8 deletions(-)
>

As preempt is pretty new I will ....

Reviewed-by: Juan Quintela <quintela@redhat.com>

But code is quite subtle.
queued.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]