[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v8 5/5] multifd: Implement zero copy write in multifd migrati
From: |
Leonardo Bras Soares Passos |
Subject: |
Re: [PATCH v8 5/5] multifd: Implement zero copy write in multifd migration (multifd-zero-copy) |
Date: |
Mon, 21 Feb 2022 16:47:56 -0300 |
On Fri, Feb 18, 2022 at 2:36 PM Juan Quintela <quintela@redhat.com> wrote:
>
> Leonardo Bras Soares Passos <leobras@redhat.com> wrote:
> > Hello Peter, thanks for reviewing!
> >
> > On Mon, Feb 7, 2022 at 11:22 PM Peter Xu <peterx@redhat.com> wrote:
> >>
> >> On Tue, Feb 01, 2022 at 03:29:03AM -0300, Leonardo Bras wrote:
> >> > -void multifd_send_sync_main(QEMUFile *f)
> >> > +int multifd_send_sync_main(QEMUFile *f)
> >> > {
> >> > int i;
> >> > + bool flush_zero_copy;
> >> >
> >> > if (!migrate_use_multifd()) {
> >> > - return;
> >> > + return 0;
> >> > }
> >> > if (multifd_send_state->pages->num) {
> >> > if (multifd_send_pages(f) < 0) {
> >> > error_report("%s: multifd_send_pages fail", __func__);
> >> > - return;
> >> > + return 0;
> >>
> >> I've not checked how it used to do if multifd_send_pages() failed, but..
> >> should
> >> it returns -1 rather than 0 when there will be a return code?
> >
> > Yeah, that makes sense.
> > The point here is that I was trying not to modify much of the current
> > behavior.
>
> if (qatomic_read(&multifd_send_state->exiting)) {
> return -1;
> }
>
> if (p->quit) {
> error_report("%s: channel %d has already quit!", __func__, i);
> qemu_mutex_unlock(&p->mutex);
> return -1;
> }
>
> This are the only two cases where the current code can return one
> error. In the 1st case we are exiting, we are already in the middle of
> finishing, so we don't really care.
> In the second one, we have already quit, and error as already quite big.
>
> But I agree with both comments:
> - we need to improve the error paths
> - leonardo changes don't affect what is already there.
>
>
> > I mean, multifd_send_sync_main() would previously return void, so any
> > other errors would not matter to the caller of this function, which
> > will continue to run as if nothing happened.
> >
> > Now, if it fails with flush_zero_copy, the operation needs to be aborted.
> >
> > Maybe, I should make it different:
> > - In any error, return -1.
> > - Create/use a specific error code in the case of a failing
> > flush_zero_copy, so I can test the return value for it on the caller
> > function and return early.
>
> We need to add the check. It don't matter if the problem is zero_copy
> or the existing one, we are under a minor catastrophe and migration has
> to be aborted.
Ok, I will fix that so we can abort in case of any error.
Maybe it's better to do that on a separated patch, before 5/5, right?
>
> > Or alternatively, the other errors could also return early, but since
> > this will change how the code currently works, I would probably need
> > another patch for that change. (so it can be easily reverted if
> > needed)
> >
> > What do you think is better?
> >
> >
> >> > }
> >> > }
> >> > +
> >> > + /*
> >> > + * When using zero-copy, it's necessary to flush after each
> >> > iteration to
> >> > + * make sure pages from earlier iterations don't end up replacing
> >> > newer
> >> > + * pages.
> >> > + */
> >> > + flush_zero_copy = migrate_use_zero_copy_send();
> >> > +
> >> > for (i = 0; i < migrate_multifd_channels(); i++) {
> >> > MultiFDSendParams *p = &multifd_send_state->params[i];
> >> >
> >> > @@ -591,7 +600,7 @@ void multifd_send_sync_main(QEMUFile *f)
> >> > if (p->quit) {
> >> > error_report("%s: channel %d has already quit", __func__,
> >> > i);
> >> > qemu_mutex_unlock(&p->mutex);
> >> > - return;
> >> > + return 0;
> >>
> >> Same question here.
> >
> > Please see above,
> >
> >>
> >> > }
> >>
> >> The rest looks good. Thanks,
>
> Later, Juan.
>
Thanks for the feedback!
Best regards,
Leo
[PATCH v8 3/5] migration: Add zero-copy-send parameter for QMP/HMP for Linux, Leonardo Bras, 2022/02/01