[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 1/2] migration: unbreak postcopy recovery
From: |
Peter Xu |
Subject: |
Re: [Qemu-devel] [PATCH 1/2] migration: unbreak postcopy recovery |
Date: |
Wed, 27 Jun 2018 18:23:54 +0800 |
User-agent: |
Mutt/1.10.0 (2018-05-17) |
On Wed, Jun 27, 2018 at 11:57:55AM +0200, Juan Quintela wrote:
> Peter Xu <address@hidden> wrote:
> > It was broken due to recent changes in two parts:
> >
> > - migration_incoming_process() will be called now even for postcopy
> > recovery sessions (while we shouldn't)
> >
> > - Now we don't call migrate_fd_process_incoming() any more (unless in
> > RDMA code), so actually the postcopy recovery logic is fully bypassed
> >
> > Fix this up to make sure we only call migration_incoming_process() when
> > necessary, and move the postcopy recovery logic far earlier to the entry
> > point of incoming migration. Renaming migration_fd_process_incoming()
> > into postcopy_try_recover() since it's mostly for the recovery process,
> > then touch up RDMA code to suite it.
> >
> > Since at it, refactor the imcoming port handling to only have one singe
> > entry point for incoming migration. Then we can avoid calling
> > migration_ioc_process_incoming() everywhere, which is really error
> > prone.
>
> Perhaps splitting some of this bits?
Actually I tried a bit but failed, but of course I can try harder. :)
>
> > diff --git a/migration/ram.h b/migration/ram.h
> > index d386f4d641..457bf54b8c 100644
> > --- a/migration/ram.h
> > +++ b/migration/ram.h
> > @@ -46,7 +46,7 @@ int multifd_save_cleanup(Error **errp);
> > int multifd_load_setup(void);
> > int multifd_load_cleanup(Error **errp);
> > bool multifd_recv_all_channels_created(void);
> > -void multifd_recv_new_channel(QIOChannel *ioc);
> > +bool multifd_recv_new_channel(QIOChannel *ioc);
>
> I am ok with this type change.
>
> > void migration_ioc_process_incoming(QIOChannel *ioc)
> > {
> > MigrationIncomingState *mis = migration_incoming_get_current();
> > + QEMUFile *f = qemu_fopen_channel_input(ioc);
>
> We are creating a QEMUFile here.
>
> > + bool start_migration;
> > +
> > + /* If it's a recovery attempt, we're done */
> > + if (postcopy_try_recover(f)) {
>
> Here we use it.
>
> > + return;
> > + }
> >
> > if (!mis->from_src_file) {
> > - QEMUFile *f = qemu_fopen_channel_input(ioc);
> > + /* The first connection (multifd may have multiple) */
> > migration_incoming_setup(f);
>
> Here we use it.
>
> > - return;
> > + /*
> > + * Common migration only needs one channel, so we can start
> > + * right now. Multifd needs more than one channel, we wait.
> > + */
> > + start_migration = !migrate_use_multifd();
> > + } else {
> > + /* Multiple connections */
> > + assert(migrate_use_multifd());
> > + start_migration = multifd_recv_new_channel(ioc);
>
> But here we don't use it. We are lecking it.
>
> So, we need to fix the leak. No clue about how to convince
> postcopy_try_recover() without the QEMUFile
Indeed! Thanks for spotting that! I'll prepare a new version.
--
Peter Xu