qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH COLO-Frame v10 09/38] COLO: Implement colo check


From: zhanghailiang
Subject: Re: [Qemu-devel] [PATCH COLO-Frame v10 09/38] COLO: Implement colo checkpoint protocol
Date: Mon, 9 Nov 2015 15:33:13 +0800
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0

On 2015/11/9 14:51, zhanghailiang wrote:
On 2015/11/7 2:26, Dr. David Alan Gilbert wrote:
* zhanghailiang (address@hidden) wrote:
We need communications protocol of user-defined to control the checkpoint
process.

The new checkpoint request is started by Primary VM, and the interactive process
like below:
Checkpoint synchronizing points,

                        Primary                         Secondary
'checkpoint-request'   @ ----------------------------->
                                                        Suspend (In hybrid mode)
'checkpoint-reply'     <------------------------------ @
                        Suspend&Save state

Why is this initial pair necessary?  Can't you just start with the vmstate-send
and save the extra request pair/round trip?  On the 2nd checkpoint we know
the SVM already received the previous checkpoint because we got it's first
vmstate-load.



Er, i have made some mistake, before 'checkpoint-request' command, we have a 
'checkpoint-ready'
communication, which is sent by SVM to PVM, to tell PVM that SVM is ready for 
checkpoint.
We do the initial work before send 'checkpoint-ready' in Secondary.

So, yes, you are right, this 'checkpoint-reply' is unnecessary for simple 
checkpoint mode.
I will remove it or maybe just add more comment about this, and add 
'checkpoint-ready' in the comment.

Yes, we can certainly drop this handshaking in simple checkpoint mode.
But we still need to do some initial work (preparing work) in simple checkpoint 
mode.
And i'm not sure if this initial work is time-wasting or not. We choose to do 
this preparing work
before send the 'checkpoint-reply' to reducing VM's STOP time as possible as we 
can.

I guess in full-COLO (rather than simple checkpoint) you can get the secondary 
to
do some of it's stopping/cleanup after it sends the checkpoint-reply

Actually, we do it before it sends 'checkpoint-reply' :)

but before vmstate-send, so you can hide some of the time.


(Perhaps add a comment to explain)


OK, I will add more comment about this.

'vmstate-send'         @ ----------------------------->
                        Send state                      Receive state
'vmstate-received'     <------------------------------ @
                        Release packets                 Load state
'vmstate-load'         <------------------------------ @
                        Resume                          Resume (In hybrid mode)

                        Start Comparing (In hybrid mode)
NOTE:
  1) '@' who sends the message
  2) Every sync-point is synchronized by two sides with only
     one handshake(single direction) for low-latency.
     If more strict synchronization is required, a opposite direction
     sync-point should be added.
  3) Since sync-points are single direction, the remote side may
     go forward a lot when this side just receives the sync-point.
  4) For now, we only support 'periodic' checkpoint, for which
    the Secondary VM is not running, later we will support 'hybrid' mode.

Signed-off-by: zhanghailiang <address@hidden>
Signed-off-by: Li Zhijian <address@hidden>
Signed-off-by: Gonglei <address@hidden>
Cc: Eric Blake <address@hidden>
---
v10:
- Rename enum COLOCmd to COLOCommand (Eric's suggestion).
- Remove unused 'ram-steal'
---
  migration/colo.c | 192 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
  qapi-schema.json |  27 ++++++++
  trace-events     |   2 +
  3 files changed, 219 insertions(+), 2 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 4fdf3a9..2510762 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -10,10 +10,12 @@
   * later.  See the COPYING file in the top-level directory.
   */

+#include <unistd.h>
  #include "sysemu/sysemu.h"
  #include "migration/colo.h"
  #include "trace.h"
  #include "qemu/error-report.h"
+#include "qemu/sockets.h"

  bool colo_supported(void)
  {
@@ -34,6 +36,103 @@ bool migration_incoming_in_colo_state(void)
      return mis && (mis->state == MIGRATION_STATUS_COLO);
  }

+/* colo checkpoint control helper */
+static int colo_ctl_put(QEMUFile *f, uint32_t cmd, uint64_t value)
+{
+    int ret = 0;
+
+    qemu_put_be32(f, cmd);
+    qemu_put_be64(f, value);
+    qemu_fflush(f);
+
+    ret = qemu_file_get_error(f);
+    trace_colo_ctl_put(COLOCommand_lookup[cmd], value);
+
+    return ret;
+}
+
+static int colo_ctl_get_cmd(QEMUFile *f, uint32_t *cmd)
+{
+    int ret = 0;
+
+    *cmd = qemu_get_be32(f);
+    ret = qemu_file_get_error(f);
+    if (ret < 0) {
+        return ret;
+    }
+    if (*cmd >= COLO_COMMAND_MAX) {
+        error_report("Invalid colo command, get cmd:%d", *cmd);
+        return -EINVAL;
+    }
+    trace_colo_ctl_get(COLOCommand_lookup[*cmd]);
+
+    return 0;
+}
+
+static int colo_ctl_get(QEMUFile *f, uint32_t require)
+{
+    int ret;
+    uint32_t cmd;
+    uint64_t value;
+
+    ret = colo_ctl_get_cmd(f, &cmd);
+    if (ret < 0) {
+        return ret;
+    }
+    if (cmd != require) {
+        error_report("Unexpect colo command, expect:%d, but get cmd:%d",
+                     require, cmd);
+        return -EINVAL;
+    }
+
+    value = qemu_get_be64(f);
+    ret = qemu_file_get_error(f);
+    if (ret < 0) {
+        return ret;
+    }
+
+    return value;
+}

Should the return type be uint64_t since you're returning value?
But then you're also using it to return an error code; so perhaps
it might be better to have a     uint64_t *value    parameter to
return the value separately; or define the range that the value
can actually take.


Good catch. Use parameter to return 'value' is a good idea,
Will fix it in next version.

Thanks,
zhanghailiang

(Also very minor typo: 'got' not 'get' in a few errors)


+static int colo_do_checkpoint_transaction(MigrationState *s)
+{
+    int ret;
+
+    ret = colo_ctl_put(s->to_dst_file, COLO_COMMAND_CHECKPOINT_REQUEST, 0);
+    if (ret < 0) {
+        goto out;
+    }
+
+    ret = colo_ctl_get(s->from_dst_file, COLO_COMMAND_CHECKPOINT_REPLY);
+    if (ret < 0) {
+        goto out;
+    }
+
+    /* TODO: suspend and save vm state to colo buffer */
+
+    ret = colo_ctl_put(s->to_dst_file, COLO_COMMAND_VMSTATE_SEND, 0);
+    if (ret < 0) {
+        goto out;
+    }
+
+    /* TODO: send vmstate to Secondary */
+
+    ret = colo_ctl_get(s->from_dst_file, COLO_COMMAND_VMSTATE_RECEIVED);
+    if (ret < 0) {
+        goto out;
+    }> +
+    ret = colo_ctl_get(s->from_dst_file, COLO_COMMAND_VMSTATE_LOADED);
+    if (ret < 0) {
+        goto out;
+    }
+
+    /* TODO: resume Primary */
+
+out:
+    return ret;
+}
+
  static void colo_process_checkpoint(MigrationState *s)
  {
      int fd, ret = 0;
@@ -51,12 +150,27 @@ static void colo_process_checkpoint(MigrationState *s)
          goto out;
      }

+    /*
+     * Wait for Secondary finish loading vm states and enter COLO
+     * restore.
+     */
+    ret = colo_ctl_get(s->from_dst_file, COLO_COMMAND_CHECKPOINT_READY);
+    if (ret < 0) {
+        goto out;
+    }
+
      qemu_mutex_lock_iothread();
      vm_start();
      qemu_mutex_unlock_iothread();
      trace_colo_vm_state_change("stop", "run");

-    /*TODO: COLO checkpoint savevm loop*/
+    while (s->state == MIGRATION_STATUS_COLO) {
+        /* start a colo checkpoint */
+        ret = colo_do_checkpoint_transaction(s);
+        if (ret < 0) {
+            goto out;
+        }
+    }

  out:
      if (ret < 0) {
@@ -79,6 +193,39 @@ void migrate_start_colo_process(MigrationState *s)
      qemu_mutex_lock_iothread();
  }

+/*
+ * return:
+ * 0: start a checkpoint
+ * -1: some error happened, exit colo restore
+ */
+static int colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request)
+{
+    int ret;
+    uint32_t cmd;
+    uint64_t value;
+
+    ret = colo_ctl_get_cmd(f, &cmd);
+    if (ret < 0) {
+        /* do failover ? */
+        return ret;
+    }
+    /* Fix me: this value should be 0, which is not so good,
+     * should be used for checking ?
+     */
+    value = qemu_get_be64(f);
+    if (value != 0) {

should output error message as well?

+        return -EINVAL;
+    }
+
+    switch (cmd) {
+    case COLO_COMMAND_CHECKPOINT_REQUEST:
+        *checkpoint_request = 1;
+        return 0;
+    default:
+        return -EINVAL;
+    }
+}
+
  void *colo_process_incoming_thread(void *opaque)
  {
      MigrationIncomingState *mis = opaque;
@@ -98,7 +245,48 @@ void *colo_process_incoming_thread(void *opaque)
          error_report("colo incoming thread: Open QEMUFile to_src_file 
failed");
          goto out;
      }
-    /* TODO: COLO checkpoint restore loop */
+
+    ret = colo_ctl_put(mis->to_src_file, COLO_COMMAND_CHECKPOINT_READY, 0);
+    if (ret < 0) {
+        goto out;
+    }
+
+    while (mis->state == MIGRATION_STATUS_COLO) {
+        int request = 0;
+        int ret = colo_wait_handle_cmd(mis->from_src_file, &request);
+
+        if (ret < 0) {
+            break;
+        } else {
+            if (!request) {
+                continue;
+            }
+        }
+
+        ret = colo_ctl_put(mis->to_src_file, COLO_COMMAND_CHECKPOINT_REPLY, 0);
+        if (ret < 0) {
+            goto out;
+        }
+
+        ret = colo_ctl_get(mis->from_src_file, COLO_COMMAND_VMSTATE_SEND);
+        if (ret < 0) {
+            goto out;
+        }
+
+        /* TODO: read migration data into colo buffer */
+
+        ret = colo_ctl_put(mis->to_src_file, COLO_COMMAND_VMSTATE_RECEIVED, 0);
+        if (ret < 0) {
+            goto out;
+        }
+
+        /* TODO: load vm state */
+
+        ret = colo_ctl_put(mis->to_src_file, COLO_COMMAND_VMSTATE_LOADED, 0);
+        if (ret < 0) {
+            goto out;
+        }
+    }

  out:
      if (ret < 0) {
diff --git a/qapi-schema.json b/qapi-schema.json
index 22251ec..5c4fe6d 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -702,6 +702,33 @@
              '*tls-port': 'int', '*cert-subject': 'str' } }

  ##
+# @COLOCommand
+#
+# The colo command
+#
+# @invalid: unknown command
+#
+# @checkpoint-ready: SVM is ready for checkpointing
+#
+# @checkpoint-request: PVM tells SVM to prepare for new checkpointing
+#
+# @checkpoint-reply: SVM gets PVM's checkpoint request
+#
+# @vmstate-send: VM's state will be sent by PVM.
+#
+# @vmstate-size: The total size of VMstate.
+#
+# @vmstate-received: VM's state has been received by SVM
+#
+# @vmstate-loaded: VM's state has been loaded by SVM
+#
+# Since: 2.5
+##
+{ 'enum': 'COLOCommand',
+  'data': [ 'invalid', 'checkpoint-ready', 'checkpoint-request',
+            'checkpoint-reply', 'vmstate-send', 'vmstate-size',
+            'vmstate-received', 'vmstate-loaded' ] }
+
  # @MouseInfo:
  #
  # Information about a mouse device.
diff --git a/trace-events b/trace-events
index 9cd6391..ee4679c 100644
--- a/trace-events
+++ b/trace-events
@@ -1499,6 +1499,8 @@ rdma_start_outgoing_migration_after_rdma_source_init(void) 
""

  # migration/colo.c
  colo_vm_state_change(const char *old, const char *new) "Change '%s' => '%s'"
+colo_ctl_put(const char *msg, uint64_t value) "Send '%s' cmd, value: %" 
PRIu64""
+colo_ctl_get(const char *msg) "Receive '%s' cmd"

  # kvm-all.c
  kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
--
1.8.3.1


--
Dr. David Alan Gilbert / address@hidden / Manchester, UK

.







reply via email to

[Prev in Thread] Current Thread [Next in Thread]