qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH COLO-Frame (Base) v20 16/17] docs: Add documenta


From: Hailiang Zhang
Subject: Re: [Qemu-devel] [PATCH COLO-Frame (Base) v20 16/17] docs: Add documentation for COLO feature
Date: Sat, 8 Oct 2016 17:32:21 +0800
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1

On 2016/10/5 21:37, Eric Blake wrote:
On 09/29/2016 03:46 AM, zhanghailiang wrote:
Introduce the design of COLO, and how to test it.

Signed-off-by: zhanghailiang <address@hidden>
---
  docs/COLO-FT.txt | 190 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
  1 file changed, 190 insertions(+)
  create mode 100644 docs/COLO-FT.txt


+
+== Background ==
+Virtual machine (VM) replication is a well known technique for providing
+application-agnostic software-implemented hardware fault tolerance
+"non-stop service".

Do you want s/tolerance/tolerance, also known as/ ?


Yes, that is more appropriate.


+== Architecture ==
+
+The architecture of COLO is shown in the bellow diagram.

s/bellow diagram/diagram below/


+It consists of a pair of networked physical nodes:
+The primary node running the PVM, and the secondary node running the SVM
+to maintain a valid replica of the PVM.
+PVM and SVM execute in parallel and generate output of response packets for
+client requests according to the application semantics.
+
+The incoming packets from the client or external network are received by the
+primary node, and then forwarded to the secondary node, so that Both the PVM

s/Both/both/


+and the SVM are stimulated with the same requests.
+
+COLO receives the outbound packets from both the PVM and SVM and compares them
+before allowing the output to be sent to clients.
+
+The SVM is qualified as a valid replica of the PVM, as long as it generates
+identical responses to all client requests. Once the differences in the outputs
+are detected between the PVM and SVM, COLO withholds transmission of the
+outbound packets until it has successfully synchronized the PVM state to the 
SVM.
+

+== Components introduction ==
+
+You can see there are several components in COLO's diagram of architecture.
+Their functions are described as bellow.

s/as bellow/below/


+
+HeartBeat:
+Runs on both the primary and secondary nodes, to periodically check platform
+availability. When the primary node suffers a hardware fail-stop failure,
+the heartbeat stops responding, the secondary node will trigger a failover
+as soon as it determines the absence.
+
+COLO disk Manager:
+When primary VM writes data into image, the colo disk manger captures this data
+and send it to secondary VM’s which makes sure the context of secondary VM's

s/send/sends/


+image is consentient with the context of primary VM 's image.

s/consentient/consistent/
s/VM 's/VM's/


+For more details, please refer to docs/block-replication.txt.
+
+Checkpoint/Failover Controller:
+Modifications of save/restore flow to realize continuous migration,
+to make sure the state of VM in Secondary side always be consistent with VM in

s/always be/is always/


+Primary side.
+
+COLO Proxy:
+Delivers packets to Primary and Seconday, and then compare the responses from
+both side. Then decide whether to start a checkpoint according to some rules.
+
+Note:
+ a. HeartBeat is not been realized, so you need to trigger failover process

s/is/has/
s/realized/implemented yet/

Is this note going to be stale once heartbeat is implemented?


Yes, but we're not sure if it is suitable to implement it in qemu.

+    by using 'x-colo-lost-heartbeat' command.
+ b. COLO proxy compents is work-in-process, it only support periodic checkpoint

s/compents is/components are a/


+    mode now, just as Micro-checkpointing.
+

+3. On Primary VM's QEMU monitor, issue command:
+{'execute':'qmp_capabilities'}
+{ 'execute': 'human-monitor-command',
+  'arguments': {'command-line': 'drive_add -n buddy 
driver=replication,mode=primary,file.driver=nbd,file.host=xx.xx.xx.xx,file.port=8889,file.export=colo-disk0,node-name=node0'}}

It would be really nice if we could get this done through QMP
blockdev-add instead of HMP drive_add.


You are right, but this command doesn't support nbd drive yet in upstream.
I saw Max had send a patch-set to support it. I will update this after his
patches been merged.

+
+Before issuing '{ "execute": "x-colo-lost-heartbeat" }' command, we have to
+issue block related command to stop block replication.
+Primary:
+  Remove the nbd child from the quorum:
+  { 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk0', 
'child': 'children.1'}}
+  { 'execute': 'human-monitor-command','arguments': {'command-line': 
'drive_del blk-buddy0'}}
+  Note: there is no qmp command to remove the blockdev now

Don't we have x-blockdev-del?


Yes, we can use this command, I'll fix it in next version.

+
+Secondary:
+  The primary host is down, so we should do the following thing:
+  { 'execute': 'nbd-server-stop' }
+
+== TODO ==
+1. Support continuously VM replication.

s/continuously/continuous/

+2. Support shared storage.
+3. Develop the heartbeat part.
+4. Reduce checkpoint VM’s downtime while do checkpoint.

s/do/doing/



All the above typos and grammatical mistake  will be fixed in next version, 
thanks!

Hailiang





reply via email to

[Prev in Thread] Current Thread [Next in Thread]