qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1 1/1] migration: Fix yank on postcopy multifd crashing gues


From: Leonardo Bras Soares Passos
Subject: Re: [PATCH v1 1/1] migration: Fix yank on postcopy multifd crashing guest after migration
Date: Mon, 14 Nov 2022 23:32:26 -0300

On Thu, Nov 10, 2022 at 10:48 AM Juan Quintela <quintela@redhat.com> wrote:
>
> Leonardo Bras <leobras@redhat.com> wrote:
> D> When multifd and postcopy-ram capabilities are enabled, if a
> > migrate-start-postcopy is attempted, the migration will finish sending the
> > memory pages and then crash with the following error:
> >
> > qemu-system-x86_64: ../util/yank.c:107: yank_unregister_instance: Assertion
> > `QLIST_EMPTY(&entry->yankfns)' failed.
> >
> > This happens because even though all multifd channels could
> > yank_register_function(), none of them could unregister it before
> > unregistering the MIGRATION_YANK_INSTANCE, causing the assert to fail.
> >
> > Fix that by calling multifd_load_cleanup() on postcopy_ram_listen_thread()
> > before MIGRATION_YANK_INSTANCE is unregistered.
>
> Hi
>
> One question,
> What warantees that migration_load_cleanup() is not called twice?
>
> I can't see anything that provides that here?  Or does postcopy have
> never done the cleanup of multifd channels before?

IIUC, postcopy is not doing multifd cleanup for a while, at least
since 6.0.0-rc2.
That is as far as I went back testing, and by fixing other (build)
bugs, I could get the yank to abort the target qemu after the
migration finished on multifd + postcopy scenario.


>
> Later, Juan.
>
>
> > Fixes: b5eea99ec2 ("migration: Add yank feature")
> > Reported-by: Li Xiaohui <xiaohli@redhat.com>
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > ---
> >  migration/migration.h |  1 +
> >  migration/migration.c | 18 +++++++++++++-----
> >  migration/savevm.c    |  2 ++
> >  3 files changed, 16 insertions(+), 5 deletions(-)
> >
> > diff --git a/migration/migration.h b/migration/migration.h
> > index cdad8aceaa..240f64efb0 100644
> > --- a/migration/migration.h
> > +++ b/migration/migration.h
> > @@ -473,6 +473,7 @@ void migration_make_urgent_request(void);
> >  void migration_consume_urgent_request(void);
> >  bool migration_rate_limit(void);
> >  void migration_cancel(const Error *error);
> > +bool migration_load_cleanup(void);
> >
> >  void populate_vfio_info(MigrationInfo *info);
> >  void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page);
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 739bb683f3..4f363b2a95 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -486,6 +486,17 @@ void migrate_add_address(SocketAddress *address)
> >                        QAPI_CLONE(SocketAddress, address));
> >  }
> >
> > +bool migration_load_cleanup(void)
> > +{
> > +    Error *local_err = NULL;
> > +
> > +    if (multifd_load_cleanup(&local_err)) {
> > +        error_report_err(local_err);
> > +        return true;
> > +    }
> > +    return false;
> > +}
> > +
> >  static void qemu_start_incoming_migration(const char *uri, Error **errp)
> >  {
> >      const char *p = NULL;
> > @@ -540,8 +551,7 @@ static void process_incoming_migration_bh(void *opaque)
> >       */
> >      qemu_announce_self(&mis->announce_timer, migrate_announce_params());
> >
> > -    if (multifd_load_cleanup(&local_err) != 0) {
> > -        error_report_err(local_err);
> > +    if (migration_load_cleanup()) {
> >          autostart = false;
> >      }
> >      /* If global state section was not received or we are in running
> > @@ -646,9 +656,7 @@ fail:
> >      migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
> >                        MIGRATION_STATUS_FAILED);
> >      qemu_fclose(mis->from_src_file);
> > -    if (multifd_load_cleanup(&local_err) != 0) {
> > -        error_report_err(local_err);
> > -    }
> > +    migration_load_cleanup();
> >      exit(EXIT_FAILURE);
> >  }
> >
> > diff --git a/migration/savevm.c b/migration/savevm.c
> > index a0cdb714f7..250caff7f4 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -1889,6 +1889,8 @@ static void *postcopy_ram_listen_thread(void *opaque)
> >          exit(EXIT_FAILURE);
> >      }
> >
> > +    migration_load_cleanup();
> > +
>
> This addition is the one that I don't understand why it was not
> needed/done before.

Please see the above comment, but tl;dr, it was not done before.


Thanks you for reviewing,
Leo

>
> >      migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> >                                     MIGRATION_STATUS_COMPLETED);
> >      /*
>
> Later, Juan.
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]