qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v8 5/5] multifd: Implement zero copy write in multifd migrati


From: Juan Quintela
Subject: Re: [PATCH v8 5/5] multifd: Implement zero copy write in multifd migration (multifd-zero-copy)
Date: Fri, 18 Feb 2022 17:57:13 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Leonardo Bras <leobras@redhat.com> wrote:
> Implement zero copy send on nocomp_send_write(), by making use of QIOChannel
> writev + flags & flush interface.
>
> Change multifd_send_sync_main() so flush_zero_copy() can be called
> after each iteration in order to make sure all dirty pages are sent before
> a new iteration is started. It will also flush at the beginning and at the
> end of migration.
>
> Also make it return -1 if flush_zero_copy() fails, in order to cancel
> the migration process, and avoid resuming the guest in the target host
> without receiving all current RAM.
>
> This will work fine on RAM migration because the RAM pages are not usually 
> freed,
> and there is no problem on changing the pages content between 
> writev_zero_copy() and
> the actual sending of the buffer, because this change will dirty the page and
> cause it to be re-sent on a next iteration anyway.
>
> A lot of locked memory may be needed in order to use multid migration
                                                       ^^^^^^
multifd.

I can fix it on the commit.


> @@ -1479,7 +1479,16 @@ static bool migrate_params_check(MigrationParameters 
> *params, Error **errp)
>          error_prepend(errp, "Invalid mapping given for block-bitmap-mapping: 
> ");
>          return false;
>      }
> -
> +#ifdef CONFIG_LINUX
> +    if (params->zero_copy_send &&
> +        (!migrate_use_multifd() ||
> +         params->multifd_compression != MULTIFD_COMPRESSION_NONE ||
> +         (params->tls_creds && *params->tls_creds))) {
> +        error_setg(errp,
> +                   "Zero copy only available for non-compressed non-TLS 
> multifd migration");
> +        return false;
> +    }
> +#endif
>      return true;
>  }

Test is long, but it is exactly what we need.  Good.


>  
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 43998ad117..2d68b9cf4f 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -568,19 +568,28 @@ void multifd_save_cleanup(void)
>      multifd_send_state = NULL;
>  }
>  
> -void multifd_send_sync_main(QEMUFile *f)
> +int multifd_send_sync_main(QEMUFile *f)
>  {
>      int i;
> +    bool flush_zero_copy;
>  
>      if (!migrate_use_multifd()) {
> -        return;
> +        return 0;
>      }
>      if (multifd_send_state->pages->num) {
>          if (multifd_send_pages(f) < 0) {
>              error_report("%s: multifd_send_pages fail", __func__);
> -            return;
> +            return 0;
>          }
>      }
> +
> +    /*
> +     * When using zero-copy, it's necessary to flush after each iteration to
> +     * make sure pages from earlier iterations don't end up replacing newer
> +     * pages.
> +     */
> +    flush_zero_copy = migrate_use_zero_copy_send();
> +
>      for (i = 0; i < migrate_multifd_channels(); i++) {
>          MultiFDSendParams *p = &multifd_send_state->params[i];
>  
> @@ -591,7 +600,7 @@ void multifd_send_sync_main(QEMUFile *f)
>          if (p->quit) {
>              error_report("%s: channel %d has already quit", __func__, i);
>              qemu_mutex_unlock(&p->mutex);
> -            return;
> +            return 0;
>          }
>  
>          p->packet_num = multifd_send_state->packet_num++;
> @@ -602,6 +611,17 @@ void multifd_send_sync_main(QEMUFile *f)
>          ram_counters.transferred += p->packet_len;
>          qemu_mutex_unlock(&p->mutex);
>          qemu_sem_post(&p->sem);
> +
> +        if (flush_zero_copy) {
> +            int ret;
> +            Error *err = NULL;
> +
> +            ret = qio_channel_flush(p->c, &err);
> +            if (ret < 0) {
> +                error_report_err(err);
> +                return -1;
> +            }
> +        }
>      }
>      for (i = 0; i < migrate_multifd_channels(); i++) {
>          MultiFDSendParams *p = &multifd_send_state->params[i];
> @@ -610,6 +630,8 @@ void multifd_send_sync_main(QEMUFile *f)
>          qemu_sem_wait(&p->sem_sync);
>      }
>      trace_multifd_send_sync_main(multifd_send_state->packet_num);
> +
> +    return 0;
>  }

We are leaving pages is flight for potentially a lot of time. I *think*
that we can sync shorter than that.

>  static void *multifd_send_thread(void *opaque)
> @@ -668,8 +690,8 @@ static void *multifd_send_thread(void *opaque)
>              p->iov[0].iov_len = p->packet_len;
>              p->iov[0].iov_base = p->packet;
>  
> -            ret = qio_channel_writev_all(p->c, p->iov, p->iovs_num,
> -                                         &local_err);
> +            ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, 
> NULL,
> +                                              0, p->write_flags, &local_err);
>              if (ret != 0) {
>                  break;
>              }

I still think that it would be better to have a sync here in each
thread.  I don't know the size, but once every couple megabytes of RAM
or so.

I did a change on:

commit d48c3a044537689866fe44e65d24c7d39a68868a
Author: Juan Quintela <quintela@redhat.com>
Date:   Fri Nov 19 15:35:58 2021 +0100

    multifd: Use a single writev on the send side
    
    Until now, we wrote the packet header with write(), and the rest of the
    pages with writev().  Just increase the size of the iovec and do a
    single writev().
    
    Signed-off-by: Juan Quintela <quintela@redhat.com>
    Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

And now we need to "perserve" this header until we do the sync,
otherwise we are overwritting it with other things.

What testing have you done after this commit?

Notice that it is not _complicated_ to fix it, I will try to come with
some idea on monday, but basically is having an array of buffers for
each thread, and store them until we call a sync().

Later, Juan.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]