qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v5 17/45] Add wrappers and handlers for sending/


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] [PATCH v5 17/45] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages.
Date: Thu, 26 Mar 2015 16:33:28 +0000
User-agent: Mutt/1.5.23 (2014-03-12)

(Only replying to some of the items in this mail - the others I'll get
to another time).

* David Gibson (address@hidden) wrote:
> On Wed, Feb 25, 2015 at 04:51:40PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <address@hidden>
> > 
> > The state of the postcopy process is managed via a series of messages;
> >    * Add wrappers and handlers for sending/receiving these messages
> >    * Add state variable that track the current state of postcopy
> > 
> > Signed-off-by: Dr. David Alan Gilbert <address@hidden>
> > ---
> >  include/migration/migration.h |  15 ++
> >  include/sysemu/sysemu.h       |  23 +++
> >  migration/migration.c         |  13 ++
> >  savevm.c                      | 325 
> > ++++++++++++++++++++++++++++++++++++++++++
> >  trace-events                  |  11 ++
> >  5 files changed, 387 insertions(+)
> > 
> > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > index f94af5b..81cd1f2 100644
> > --- a/include/migration/migration.h
> > +++ b/include/migration/migration.h
> > @@ -52,6 +52,14 @@ typedef struct MigrationState MigrationState;
> >  
> >  typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head;
> >  
> > +typedef enum {
> > +    POSTCOPY_INCOMING_NONE = 0,  /* Initial state - no postcopy */
> > +    POSTCOPY_INCOMING_ADVISE,
> > +    POSTCOPY_INCOMING_LISTENING,
> > +    POSTCOPY_INCOMING_RUNNING,
> > +    POSTCOPY_INCOMING_END
> > +} PostcopyState;
> > +
> >  /* State for the incoming migration */
> >  struct MigrationIncomingState {
> >      QEMUFile *file;
> > @@ -59,6 +67,8 @@ struct MigrationIncomingState {
> >      /* See savevm.c */
> >      LoadStateEntry_Head loadvm_handlers;
> >  
> > +    PostcopyState postcopy_state;
> > +
> >      QEMUFile *return_path;
> >      QemuMutex      rp_mutex;    /* We send replies from multiple threads */
> >  };
> > @@ -219,4 +229,9 @@ size_t ram_control_save_page(QEMUFile *f, ram_addr_t 
> > block_offset,
> >                               ram_addr_t offset, size_t size,
> >                               int *bytes_sent);
> >  
> > +PostcopyState postcopy_state_get(MigrationIncomingState *mis);
> > +
> > +/* Set the state and return the old state */
> > +PostcopyState postcopy_state_set(MigrationIncomingState *mis,
> > +                                 PostcopyState new_state);
> >  #endif
> > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> > index 8da879f..d6a6d51 100644
> > --- a/include/sysemu/sysemu.h
> > +++ b/include/sysemu/sysemu.h
> > @@ -87,6 +87,18 @@ enum qemu_vm_cmd {
> >      MIG_CMD_INVALID = 0,       /* Must be 0 */
> >      MIG_CMD_OPEN_RETURN_PATH,  /* Tell the dest to open the Return path */
> >      MIG_CMD_PING,              /* Request a PONG on the RP */
> > +
> > +    MIG_CMD_POSTCOPY_ADVISE = 20,  /* Prior to any page transfers, just
> > +                                      warn we might want to do PC */
> > +    MIG_CMD_POSTCOPY_LISTEN,       /* Start listening for incoming
> > +                                      pages as it's running. */
> > +    MIG_CMD_POSTCOPY_RUN,          /* Start execution */
> > +    MIG_CMD_POSTCOPY_END,          /* Postcopy is finished. */
> > +
> > +    MIG_CMD_POSTCOPY_RAM_DISCARD,  /* A list of pages to discard that
> > +                                      were previously sent during
> > +                                      precopy but are dirty. */
> > +
> >  };
> >  
> >  bool qemu_savevm_state_blocked(Error **errp);
> > @@ -101,6 +113,17 @@ void qemu_savevm_command_send(QEMUFile *f, enum 
> > qemu_vm_cmd command,
> >                                uint16_t len, uint8_t *data);
> >  void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
> >  void qemu_savevm_send_open_return_path(QEMUFile *f);
> > +void qemu_savevm_send_postcopy_advise(QEMUFile *f);
> > +void qemu_savevm_send_postcopy_listen(QEMUFile *f);
> > +void qemu_savevm_send_postcopy_run(QEMUFile *f);
> > +void qemu_savevm_send_postcopy_end(QEMUFile *f, uint8_t status);
> > +
> > +void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
> > +                                           uint16_t len, uint8_t offset,
> > +                                           uint64_t *addrlist,
> > +                                           uint32_t *masklist);
> > +
> > +
> >  int qemu_loadvm_state(QEMUFile *f);
> >  
> >  /* SLIRP */
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 434864a..957115a 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -971,3 +971,16 @@ void migrate_fd_connect(MigrationState *s)
> >      qemu_thread_create(&s->thread, "migration", migration_thread, s,
> >                         QEMU_THREAD_JOINABLE);
> >  }
> > +
> > +PostcopyState  postcopy_state_get(MigrationIncomingState *mis)
> > +{
> > +    return atomic_fetch_add(&mis->postcopy_state, 0);
> > +}
> > +
> > +/* Set the state and return the old state */
> > +PostcopyState postcopy_state_set(MigrationIncomingState *mis,
> > +                                 PostcopyState new_state)
> > +{
> > +    return atomic_xchg(&mis->postcopy_state, new_state);
> 
> Is there anything explaining what the overall atomicity requirements
> are for this state variable?  It's a bit hard to tell if an atomic
> xchg is necessary or sufficient without a description of what the
> overall concurrency scheme is with regards to this variable.

Can you tell me how to define the requirements?
It's a state variable tested and changed by at least two threads and
it's got to go through a correct sequence of states.
So generally you're doing a 'I expect to be in .... now change to ....'
so the exchange works well for that.

> > + *  n x
> > + *      be64   Page addresses for start of an invalidation range
> > + *      be32   mask of 32 pages, '1' to discard'
> 
> Is the extra compactness from this semi-sparse bitmap encoding
> actually worth it?  A simple list of page addresses, or address ranges
> to discard would be substantially simpler to get one's head around,
> and also seems like it might be more robust against future
> implementation changes as a wire format.

As previously discussed I really think it is;  what I'm tending to
see when I've been looking at these in debug is something that's
sparse but tends to be blobby with sets of pages discarded near
by.  However you do this you're going to have to walk this
bitmap and format out some sort of set of messages.

> > + *  Hopefully this is pretty sparse so we don't get too many entries,
> > + *  and using the mask should deal with most pagesize differences
> > + *  just ending up as a single full mask
> > + *
> > + * The mask is always 32bits irrespective of the long size
> > + *
> > + *  name:  RAMBlock name that these entries are part of
> > + *  len: Number of page entries
> > + *  addrlist: 'len' addresses
> > + *  masklist: 'len' masks (corresponding to the addresses)
> > + */
> > +void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
> > +                                           uint16_t len, uint8_t offset,
> > +                                           uint64_t *addrlist,
> > +                                           uint32_t *masklist)
> > +{
> > +    uint8_t *buf;
> > +    uint16_t tmplen;
> > +    uint16_t t;
> > +
> > +    trace_qemu_savevm_send_postcopy_ram_discard();
> > +    buf = g_malloc0(len*12 + strlen(name) + 3);
> > +    buf[0] = 0; /* Version */
> > +    buf[1] = offset;
> > +    assert(strlen(name) < 256);
> > +    buf[2] = strlen(name);
> > +    memcpy(buf+3, name, strlen(name));
> > +    tmplen = 3+strlen(name);
> 
> Repeated calls to strlen() always seem icky to me, although I guess
> it's all gcc builtins here, so they are probably optimized out by
> CSE.

Yeh, those are now gone.  Thanks.

> > +    for (t = 0; t < len; t++) {
> > +        cpu_to_be64w((uint64_t *)(buf + tmplen), addrlist[t]);
> > +        tmplen += 8;
> > +        cpu_to_be32w((uint32_t *)(buf + tmplen), masklist[t]);
> > +        tmplen += 4;
> > +    }
> > +    qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RAM_DISCARD, tmplen, buf);
> > +    g_free(buf);
> > +}
> > +
> > +/* Get the destination into a state where it can receive postcopy data. */
> > +void qemu_savevm_send_postcopy_listen(QEMUFile *f)
> > +{
> > +    trace_savevm_send_postcopy_listen();
> > +    qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_LISTEN, 0, NULL);
> > +}
> > +
> > +/* Kick the destination into running */
> > +void qemu_savevm_send_postcopy_run(QEMUFile *f)
> > +{
> > +    trace_savevm_send_postcopy_run();
> > +    qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RUN, 0, NULL);
> > +}
> 
> DISCARD will typically immediately precede LISTEN, won't it?  Is there
> a reason not to put the discard data into the LISTEN command?

Discard data can be quite large, so I potentially send multiple discard
commands.
(Also as you can tell generally I've got a preference for one message does one
thing, and thus I have tried to keep them separate).

> > +
> > +/* End of postcopy - with a status byte; 0 is good, anything else is a 
> > fail */
> > +void qemu_savevm_send_postcopy_end(QEMUFile *f, uint8_t status)
> > +{
> > +    trace_savevm_send_postcopy_end();
> > +    qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_END, 1, &status);
> > +}
> 
> What's the distinction between the postcopy END command and the normal
> end of the migration stream?  Is there already a way to detect the end
> of stream normally?

OK, thanks I've killed that off.
The short answer is that the 'end of migration stream' is just a terminating
byte and I was hoping to put something better on there with a status
from the source side;  but that fix can be a general fix and doesn't
need to be postcopy.

> 
> >  bool qemu_savevm_state_blocked(Error **errp)
> >  {
> >      SaveStateEntry *se;
> > @@ -961,6 +1046,212 @@ enum LoadVMExitCodes {
> >  
> >  static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState 
> > *mis);
> >  
> > +/* ------ incoming postcopy messages ------ */
> > +/* 'advise' arrives before any transfers just to tell us that a postcopy
> > + * *might* happen - it might be skipped if precopy transferred everything
> > + * quickly.
> > + */
> > +static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis,
> > +                                         uint64_t remote_hps,
> > +                                         uint64_t remote_tps)
> > +{
> > +    PostcopyState ps = postcopy_state_get(mis);
> > +    trace_loadvm_postcopy_handle_advise();
> > +    if (ps != POSTCOPY_INCOMING_NONE) {
> > +        error_report("CMD_POSTCOPY_ADVISE in wrong postcopy state (%d)", 
> > ps);
> > +        return -1;
> > +    }
> > +
> > +    if (remote_hps != getpagesize())  {
> > +        /*
> > +         * Some combinations of mismatch are probably possible but it gets
> > +         * a bit more complicated.  In particular we need to place whole
> > +         * host pages on the dest at once, and we need to ensure that we
> > +         * handle dirtying to make sure we never end up sending part of
> > +         * a hostpage on it's own.
> > +         */
> > +        error_report("Postcopy needs matching host page sizes (s=%d d=%d)",
> > +                     (int)remote_hps, getpagesize());
> > +        return -1;
> > +    }
> > +
> > +    if (remote_tps != (1ul << qemu_target_page_bits())) {
> > +        /*
> > +         * Again, some differences could be dealt with, but for now keep it
> > +         * simple.
> > +         */
> > +        error_report("Postcopy needs matching target page sizes (s=%d 
> > d=%d)",
> > +                     (int)remote_tps, 1 << qemu_target_page_bits());
> > +        return -1;
> > +    }
> > +
> > +    postcopy_state_set(mis, POSTCOPY_INCOMING_ADVISE);
> 
> Should you be checking the return value here to make sure it's still
> POSTCOPY_INCOMING_NONE?  Atomic xchgs seem overkill if you still have
> a race between the fetch at the top and the set here.
> 
> Or, in fact, should you be just doing an atomic exchange-then-check at
> the top, rather than checking at the top, then changing at the bottom.

There's no race at this point yet; going from None->advise we still only
have one thread.  The check at the top is a check against protocol
violations (e.g. getting two advise or something like that).

> > +    return 0;
> > +}
> > +
> > +/* After postcopy we will be told to throw some pages away since they're
> > + * dirty and will have to be demand fetched.  Must happen before CPU is
> > + * started.
> > + * There can be 0..many of these messages, each encoding multiple pages.
> > + * Bits set in the message represent a page in the source VMs bitmap, but
> > + * since the guest/target page sizes can be different on s/d then we have
> > + * to convert.
> 
> Uh.. I thought the checks in the ADVISE processing eliminated that 
> possibility.

Old message; I was originally trying to keep it more general, but ripped
the conversions out when it became just too messy for now.
Gone.


Dave
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]