qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/2] migration: unbreak postcopy recovery


From: Peter Xu
Subject: Re: [Qemu-devel] [PATCH 1/2] migration: unbreak postcopy recovery
Date: Wed, 27 Jun 2018 18:23:54 +0800
User-agent: Mutt/1.10.0 (2018-05-17)

On Wed, Jun 27, 2018 at 11:57:55AM +0200, Juan Quintela wrote:
> Peter Xu <address@hidden> wrote:
> > It was broken due to recent changes in two parts:
> >
> > - migration_incoming_process() will be called now even for postcopy
> >   recovery sessions (while we shouldn't)
> >
> > - Now we don't call migrate_fd_process_incoming() any more (unless in
> >   RDMA code), so actually the postcopy recovery logic is fully bypassed
> >
> > Fix this up to make sure we only call migration_incoming_process() when
> > necessary, and move the postcopy recovery logic far earlier to the entry
> > point of incoming migration.  Renaming migration_fd_process_incoming()
> > into postcopy_try_recover() since it's mostly for the recovery process,
> > then touch up RDMA code to suite it.
> >
> > Since at it, refactor the imcoming port handling to only have one singe
> > entry point for incoming migration.  Then we can avoid calling
> > migration_ioc_process_incoming() everywhere, which is really error
> > prone.
> 
> Perhaps splitting some of this bits?

Actually I tried a bit but failed, but of course I can try harder. :)

> 
> > diff --git a/migration/ram.h b/migration/ram.h
> > index d386f4d641..457bf54b8c 100644
> > --- a/migration/ram.h
> > +++ b/migration/ram.h
> > @@ -46,7 +46,7 @@ int multifd_save_cleanup(Error **errp);
> >  int multifd_load_setup(void);
> >  int multifd_load_cleanup(Error **errp);
> >  bool multifd_recv_all_channels_created(void);
> > -void multifd_recv_new_channel(QIOChannel *ioc);
> > +bool multifd_recv_new_channel(QIOChannel *ioc);
> 
> I am ok with this type change.
> 
> >  void migration_ioc_process_incoming(QIOChannel *ioc)
> >  {
> >      MigrationIncomingState *mis = migration_incoming_get_current();
> > +    QEMUFile *f = qemu_fopen_channel_input(ioc);
> 
> We are creating a QEMUFile here.
> 
> > +    bool start_migration;
> > +
> > +    /* If it's a recovery attempt, we're done */
> > +    if (postcopy_try_recover(f)) {
> 
> Here we use it.
> 
> > +        return;
> > +    }
> >  
> >      if (!mis->from_src_file) {
> > -        QEMUFile *f = qemu_fopen_channel_input(ioc);
> > +        /* The first connection (multifd may have multiple) */
> >          migration_incoming_setup(f);
> 
>             Here we use it.
> 
> > -        return;
> > +        /*
> > +         * Common migration only needs one channel, so we can start
> > +         * right now.  Multifd needs more than one channel, we wait.
> > +         */
> > +        start_migration = !migrate_use_multifd();
> > +    } else {
> > +        /* Multiple connections */
> > +        assert(migrate_use_multifd());
> > +        start_migration = multifd_recv_new_channel(ioc);
> 
>        But here we don't use it.  We are lecking it.
> 
>        So, we need to fix the leak.  No clue about how to convince
>        postcopy_try_recover() without the QEMUFile

Indeed!  Thanks for spotting that!  I'll prepare a new version.

-- 
Peter Xu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]