qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v8 5/5] multifd: Implement zero copy write in multifd migrati


From: Leonardo Bras Soares Passos
Subject: Re: [PATCH v8 5/5] multifd: Implement zero copy write in multifd migration (multifd-zero-copy)
Date: Tue, 22 Feb 2022 01:09:30 -0300

On Mon, Feb 21, 2022 at 4:41 PM Leonardo Bras Soares Passos
<leobras@redhat.com> wrote:
>
> Hello Juan, thanks for the feedback!
>
> On Fri, Feb 18, 2022 at 1:57 PM Juan Quintela <quintela@redhat.com> wrote:
> >
> > Leonardo Bras <leobras@redhat.com> wrote:
> > > Implement zero copy send on nocomp_send_write(), by making use of 
> > > QIOChannel
> > > writev + flags & flush interface.
> > >
> > > Change multifd_send_sync_main() so flush_zero_copy() can be called
> > > after each iteration in order to make sure all dirty pages are sent before
> > > a new iteration is started. It will also flush at the beginning and at the
> > > end of migration.
> > >
> > > Also make it return -1 if flush_zero_copy() fails, in order to cancel
> > > the migration process, and avoid resuming the guest in the target host
> > > without receiving all current RAM.
> > >
> > > This will work fine on RAM migration because the RAM pages are not 
> > > usually freed,
> > > and there is no problem on changing the pages content between 
> > > writev_zero_copy() and
> > > the actual sending of the buffer, because this change will dirty the page 
> > > and
> > > cause it to be re-sent on a next iteration anyway.
> > >
> > > A lot of locked memory may be needed in order to use multid migration
> >                                                        ^^^^^^
> > multifd.
> >
> > I can fix it on the commit.
>
> No worries, fixed for v9.
>
> >
> >
> > > @@ -1479,7 +1479,16 @@ static bool 
> > > migrate_params_check(MigrationParameters *params, Error **errp)
> > >          error_prepend(errp, "Invalid mapping given for 
> > > block-bitmap-mapping: ");
> > >          return false;
> > >      }
> > > -
> > > +#ifdef CONFIG_LINUX
> > > +    if (params->zero_copy_send &&
> > > +        (!migrate_use_multifd() ||
> > > +         params->multifd_compression != MULTIFD_COMPRESSION_NONE ||
> > > +         (params->tls_creds && *params->tls_creds))) {
> > > +        error_setg(errp,
> > > +                   "Zero copy only available for non-compressed non-TLS 
> > > multifd migration");
> > > +        return false;
> > > +    }
> > > +#endif
> > >      return true;
> > >  }
> >
> > Test is long, but it is exactly what we need.  Good.
>
> Thanks!
>
>
> >
> >
> > >
> > > diff --git a/migration/multifd.c b/migration/multifd.c
> > > index 43998ad117..2d68b9cf4f 100644
> > > --- a/migration/multifd.c
> > > +++ b/migration/multifd.c
> > > @@ -568,19 +568,28 @@ void multifd_save_cleanup(void)
> > >      multifd_send_state = NULL;
> > >  }
> > >
> > > -void multifd_send_sync_main(QEMUFile *f)
> > > +int multifd_send_sync_main(QEMUFile *f)
> > >  {
> > >      int i;
> > > +    bool flush_zero_copy;
> > >
> > >      if (!migrate_use_multifd()) {
> > > -        return;
> > > +        return 0;
> > >      }
> > >      if (multifd_send_state->pages->num) {
> > >          if (multifd_send_pages(f) < 0) {
> > >              error_report("%s: multifd_send_pages fail", __func__);
> > > -            return;
> > > +            return 0;
> > >          }
> > >      }
> > > +
> > > +    /*
> > > +     * When using zero-copy, it's necessary to flush after each 
> > > iteration to
> > > +     * make sure pages from earlier iterations don't end up replacing 
> > > newer
> > > +     * pages.
> > > +     */
> > > +    flush_zero_copy = migrate_use_zero_copy_send();
> > > +
> > >      for (i = 0; i < migrate_multifd_channels(); i++) {
> > >          MultiFDSendParams *p = &multifd_send_state->params[i];
> > >
> > > @@ -591,7 +600,7 @@ void multifd_send_sync_main(QEMUFile *f)
> > >          if (p->quit) {
> > >              error_report("%s: channel %d has already quit", __func__, i);
> > >              qemu_mutex_unlock(&p->mutex);
> > > -            return;
> > > +            return 0;
> > >          }
> > >
> > >          p->packet_num = multifd_send_state->packet_num++;
> > > @@ -602,6 +611,17 @@ void multifd_send_sync_main(QEMUFile *f)
> > >          ram_counters.transferred += p->packet_len;
> > >          qemu_mutex_unlock(&p->mutex);
> > >          qemu_sem_post(&p->sem);
> > > +
> > > +        if (flush_zero_copy) {
> > > +            int ret;
> > > +            Error *err = NULL;
> > > +
> > > +            ret = qio_channel_flush(p->c, &err);
> > > +            if (ret < 0) {
> > > +                error_report_err(err);
> > > +                return -1;
> > > +            }
> > > +        }
> > >      }
> > >      for (i = 0; i < migrate_multifd_channels(); i++) {
> > >          MultiFDSendParams *p = &multifd_send_state->params[i];
> > > @@ -610,6 +630,8 @@ void multifd_send_sync_main(QEMUFile *f)
> > >          qemu_sem_wait(&p->sem_sync);
> > >      }
> > >      trace_multifd_send_sync_main(multifd_send_state->packet_num);
> > > +
> > > +    return 0;
> > >  }
> >
> > We are leaving pages is flight for potentially a lot of time. I *think*
> > that we can sync shorter than that.
> >
> > >  static void *multifd_send_thread(void *opaque)
> > > @@ -668,8 +690,8 @@ static void *multifd_send_thread(void *opaque)
> > >              p->iov[0].iov_len = p->packet_len;
> > >              p->iov[0].iov_base = p->packet;
> > >
> > > -            ret = qio_channel_writev_all(p->c, p->iov, p->iovs_num,
> > > -                                         &local_err);
> > > +            ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, 
> > > NULL,
> > > +                                              0, p->write_flags, 
> > > &local_err);
> > >              if (ret != 0) {
> > >                  break;
> > >              }
> >
> > I still think that it would be better to have a sync here in each
> > thread.  I don't know the size, but once every couple megabytes of RAM
> > or so.
>
> This seems a good idea, since the first iteration may take a while,
> and we may take a lot of time to fail if something goes wrong with
> zerocopy at the start of iteration 1.
>
> On the other hand, flushing takes some time, and doing it a lot may
> take away some of the performance improvements.
>
> Maybe we could move the flushing to a thread of it's own, if it
> becomes a problem.
>
>
> >
> > I did a change on:
> >
> > commit d48c3a044537689866fe44e65d24c7d39a68868a
> > Author: Juan Quintela <quintela@redhat.com>
> > Date:   Fri Nov 19 15:35:58 2021 +0100
> >
> >     multifd: Use a single writev on the send side
> >
> >     Until now, we wrote the packet header with write(), and the rest of the
> >     pages with writev().  Just increase the size of the iovec and do a
> >     single writev().
> >
> >     Signed-off-by: Juan Quintela <quintela@redhat.com>
> >     Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >
> > And now we need to "perserve" this header until we do the sync,
> > otherwise we are overwritting it with other things.
>
> Yeah, that is a problem I faced on non-multifd migration, and a reason
> why I choose to implement directly in multifd.
>
> >
> > What testing have you done after this commit?
>
> Not much, but this will probably become an issue with bigger guests.
>
> >
> > Notice that it is not _complicated_ to fix it, I will try to come with
> > some idea on monday, but basically is having an array of buffers for
> > each thread, and store them until we call a sync().
>
> That will probably work, we can use MultiFDRecvParams->packet as this
> array, right?
> We can start with a somehow small buffer size and grow if it ever gets full.
> (Full means a sync + g_try_malloc0 + g_free)

Or maybe use the array size as a size limiter before flushing, so it
will always flush after BUFFSIZE headers are sent.

>
> By my calculations, it should take 1,8kB each element on a 4k PAGESIZE.
>
> What do you think?
>
> Best regards,
> Leo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]