qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/1] migration: Terminate multifd threads on yank


From: Dr. David Alan Gilbert
Subject: Re: [PATCH 1/1] migration: Terminate multifd threads on yank
Date: Mon, 2 Aug 2021 16:35:20 +0100
User-agent: Mutt/2.0.7 (2021-05-04)

* Leonardo Bras (leobras@redhat.com) wrote:
> From source host viewpoint, losing a connection during migration will
> cause the sockets to get stuck in sendmsg() syscall, waiting for
> the receiving side to reply.
> 
> In migration, yank works by shutting-down the migration QIOChannel fd.
> This causes a failure in the next sendmsg() for that fd, and the whole
> migration gets cancelled.
> 
> In multifd, due to having multiple sockets in multiple threads,
> on a connection loss there will be extra sockets stuck in sendmsg(),
> and because they will be holding their own mutex, there is good chance
> the main migration thread can get stuck in multifd_send_pages()
> waiting for one of those mutexes.
> 
> While it's waiting, the main migration thread can't run sendmsg() on
> it's fd, and therefore can't cause the migration to be cancelled, thus
> causing yank not to work.
> 
> Fixes this by shutting down all migration fds (including multifd ones),
> so no thread get's stuck in sendmsg() while holding a lock, and thus
> allowing the main migration thread to properly cancel migration when
> yank is used.
> 
> There is no need to do the same procedure to yank to work in the
> receiving host since ops->recv_pages() is kept outside the mutex protected
> code in multifd_recv_thread().
> 
> Buglink:https://bugzilla.redhat.com/show_bug.cgi?id=1970337
> Reported-by: Li Xiaohui <xiaohli@redhat.com>
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
>  migration/multifd.c        | 11 +++++++++++
>  migration/multifd.h        |  1 +
>  migration/yank_functions.c |  2 ++
>  3 files changed, 14 insertions(+)
> 
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 377da78f5b..744a180dfe 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -1040,6 +1040,17 @@ void multifd_recv_sync_main(void)
>      trace_multifd_recv_sync_main(multifd_recv_state->packet_num);
>  }
>  
> +void multifd_shutdown(void)
> +{
> +    if (!migrate_use_multifd()) {
> +        return;
> +    }
> +
> +    if (multifd_send_state) {
> +        multifd_send_terminate_threads(NULL);
> +    }

That calls :
    for (i = 0; i < migrate_multifd_channels(); i++) {
        MultiFDSendParams *p = &multifd_send_state->params[i];

        qemu_mutex_lock(&p->mutex);
        p->quit = true;
        qemu_sem_post(&p->sem);
        qemu_mutex_unlock(&p->mutex);
    }

so why doesn't this also get stuck in the same mutex you're trying to
fix?

Does the qio_channel_shutdown actually cause a shutdown on all fd's
for the multifd?

(I've just seen the multifd/cancel test fail stuck in multifd_send_sync_main
waiting on one of the locks).

Dave

> +}
> +
>  static void *multifd_recv_thread(void *opaque)
>  {
>      MultiFDRecvParams *p = opaque;
> diff --git a/migration/multifd.h b/migration/multifd.h
> index 8d6751f5ed..0517213bdf 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -22,6 +22,7 @@ bool multifd_recv_new_channel(QIOChannel *ioc, Error 
> **errp);
>  void multifd_recv_sync_main(void);
>  void multifd_send_sync_main(QEMUFile *f);
>  int multifd_queue_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset);
> +void multifd_shutdown(void);
>  
>  /* Multifd Compression flags */
>  #define MULTIFD_FLAG_SYNC (1 << 0)
> diff --git a/migration/yank_functions.c b/migration/yank_functions.c
> index 8c08aef14a..9335a64f00 100644
> --- a/migration/yank_functions.c
> +++ b/migration/yank_functions.c
> @@ -15,12 +15,14 @@
>  #include "io/channel-socket.h"
>  #include "io/channel-tls.h"
>  #include "qemu-file.h"
> +#include "multifd.h"
>  
>  void migration_yank_iochannel(void *opaque)
>  {
>      QIOChannel *ioc = QIO_CHANNEL(opaque);
>  
>      qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
> +    multifd_shutdown();
>  }
>  
>  /* Return whether yank is supported on this ioc */
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK




reply via email to

[Prev in Thread] Current Thread [Next in Thread]