qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping


From: Hailiang Zhang
Subject: Re: [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
Date: Tue, 15 Dec 2015 20:41:08 +0800
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0

On 2015/12/15 20:14, Dr. David Alan Gilbert wrote:
* zhanghailiang (address@hidden) wrote:
This is the 12th version of COLO.

As usual, this version of COLO is only support periodic checkpoint,
just like MicroCheckpointing and Remus does.

Here is only COLO frame part, you can get the whole codes from github:
https://github.com/coloft/qemu/commits/colo-v2.3-periodic-mode

Hi,
   Have you tried wiring in Zhang Chen's new userland colo proxy yet?
I'd like to start trying it out.


Not yet, actually, for frame part, we can re-use most of the previous codes 
that based on
kernel proxy. And, yes, please, you are welcome to join us. ;)

Dave

Test procedure:
1. Startup qemu
Primary side:
#x86_64-softmmu/qemu-system-x86_64 -enable-kvm -boot c -m 2048 -smp 2 -qmp 
stdio -vnc :7 -name primary -cpu qemu64,+kvmclock -device piix3-usb-uhci 
-device usb-tablet -netdev tap,id=hn0,vhost=off -device 
virtio-net-pci,id=net-pci0,netdev=hn0 -drive 
if=virtio,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0.file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,children.0.driver=raw
Secondary side:
#x86_64-softmmu/qemu-system-x86_64 -boot c -m 2048 -smp 2 -qmp stdio -vnc :7 
-name secondary -enable-kvm -cpu qemu64,+kvmclock -device piix3-usb-uhci 
-device usb-tablet -netdev tap,id=hn0,vhost=off -device 
virtio-net-pci,id=net-pci0,netdev=hn0 -drive 
if=none,id=colo-disk0,file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,driver=raw,node-name=node0
 -drive 
if=virtio,id=active-disk0,throttling.bps-total=70000000,driver=replication,mode=secondary,file.driver=qcow2,file.file.filename=/mnt/ramfs/active_disk.img,file.backing.driver=qcow2,file.backing.file.filename=/mnt/ramfs/hidden_disk.img,file.backing.backing=colo-disk0
 -incoming tcp:0:8888
2. On Secondary VM's QEMU monitor, issue command
{'execute':'qmp_capabilities'}
{'execute': 'nbd-server-start', 'arguments': {'addr': {'type': 'inet', 'data': 
{'host': '192.168.2.88', 'port': '8889'} } } }
{'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk0', 'writable': 
true } }
{'execute': 'trace-event-set-state', 'arguments': {'name': 'colo*', 'enable': 
true} }

3. On Primary VM's QEMU monitor, issue command:
{'execute':'qmp_capabilities'}
{'execute': 'human-monitor-command', 'arguments': {'command-line': 'drive_add 
buddy 
driver=replication,mode=primary,file.driver=nbd,file.host=9.61.1.7,file.port=8889,file.export=colo-disk0,node-name=node0,if=none'}}
{'execute':'x-blockdev-change', 'arguments':{'parent': 'colo-disk0', 'node': 
'node0' } }
{'execute': 'migrate-set-capabilities', 'arguments': {'capabilities': [ 
{'capability': 'x-colo', 'state': true } ] } }
{'execute': 'migrate', 'arguments': {'uri': 'tcp:192.168.2.88:8888' } }

4. After the above steps, you will see, whenever you make changes to PVM, SVM 
will be synced.
You can by issue command '{ "execute": "migrate-set-parameters" , "arguments":{ 
"x-checkpoint-delay": 2000 } }'
to change the checkpoint period time.

5. Failover test
You can kill Primary VM and run 'x_colo_lost_heartbeat' in Secondary VM's
monitor at the same time, then SVM will failover and client will not feel this
change.

Before issuing '{ "execute": "x-colo-lost-heartbeat" }' command, we have to
issue block related command to stop block replication.
Primary:
   Remove the nbd child from the quorum:
   { 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk0', 
'child': 'children.1'}}
   Note: there is no qmp command to remove the blockdev now

Secondary:
   The primary host is down, so we should do the following thing:
   { 'execute': 'nbd-server-stop' }

Please review, thanks.

TODO:
1. Implement packets compare module (proxy) in qemu (Doing)
2. Checkpoint based on proxy in qemu
3. The capability of continuous FT

v12:
  - Fix the bug that default buffer filter broken vhost-net.
  - Add an flag in struct NetFilterState to help skipping default
   filter for packets travelling through filter layer.
  - Remove the default failover treatment which may cause split-brain.
  - Rename checkpoint-delay to x-checkpoint-delay.
  - Check if all netdev supports default filter before going into COLO.
  - Reconstruct send/receive helper functions in patch 10.
  - Address serveral other comments from Dave

v11:
  - Re-implement buffer/release packets based on filter-buffer according
    to Jason Wang's suggestion. (patch 34, patch 36 ~ patch 38)
  - Rebase master to re-use some stuff introduced by post-copy.
  - Address several comments from Eric and Dave, the fixing record can
    be found in each patch.

v10:
  - Rename 'colo_lost_heartbeat' command to experimental 'x_colo_lost_heartbeat'
  - Rename migration capability 'colo' to 'x-colo' (Eric's suggestion)
  - Simplify the process of primary side by dropping colo thread and reusing
    migration thread. (Dave's suggestion)
  - Add several netfilter related APIs to support buffer/release packets
    for COLO (patch 32 ~ patch 36)

zhanghailiang (38):
   configure: Add parameter for configure to enable/disable COLO support
   migration: Introduce capability 'x-colo' to migration
   COLO: migrate colo related info to secondary node
   migration: Export migrate_set_state()
   migration: Add state records for migration incoming
   migration: Integrate COLO checkpoint process into migration
   migration: Integrate COLO checkpoint process into loadvm
   migration: Rename the'file' member of MigrationState
   COLO/migration: Create a new communication path from destination to
     source
   COLO: Implement colo checkpoint protocol
   COLO: Add a new RunState RUN_STATE_COLO
   QEMUSizedBuffer: Introduce two help functions for qsb
   COLO: Save PVM state to secondary side when do checkpoint
   ram: Split host_from_stream_offset() into two helper functions
   COLO: Load PVM's dirty pages into SVM's RAM cache temporarily
   ram/COLO: Record the dirty pages that SVM received
   COLO: Load VMState into qsb before restore it
   COLO: Flush PVM's cached RAM into SVM's memory
   COLO: Add checkpoint-delay parameter for migrate-set-parameters
   COLO: synchronize PVM's state to SVM periodically
   COLO failover: Introduce a new command to trigger a failover
   COLO failover: Introduce state to record failover process
   COLO: Implement failover work for Primary VM
   COLO: Implement failover work for Secondary VM
   qmp event: Add event notification for COLO error
   COLO failover: Shutdown related socket fd when do failover
   COLO failover: Don't do failover during loading VM's state
   COLO: Process shutdown command for VM in COLO state
   COLO: Update the global runstate after going into colo state
   savevm: Split load vm state function qemu_loadvm_state
   COLO: Separate the process of saving/loading ram and device state
   COLO: Split qemu_savevm_state_begin out of checkpoint process
   net/filter-buffer: Add default filter-buffer for each netdev
   filter-buffer: Accept zero interval
   filter-buffer: Introduce a helper function to enable/disable default
     filter
   filter-buffer: Introduce a helper function to release packets
   colo: Use default buffer-filter to buffer and release packets
   COLO: Add block replication into colo process

  configure                     |  11 +
  docs/qmp-events.txt           |  17 +
  hmp-commands.hx               |  15 +
  hmp.c                         |  15 +
  hmp.h                         |   1 +
  include/exec/ram_addr.h       |   9 +-
  include/migration/colo.h      |  38 +++
  include/migration/failover.h  |  33 ++
  include/migration/migration.h |  18 +-
  include/migration/qemu-file.h |   3 +-
  include/net/filter.h          |  12 +
  include/net/net.h             |   5 +
  include/sysemu/sysemu.h       |   9 +
  migration/Makefile.objs       |   2 +
  migration/colo-comm.c         |  71 ++++
  migration/colo-failover.c     |  83 +++++
  migration/colo.c              | 765 ++++++++++++++++++++++++++++++++++++++++++
  migration/exec.c              |   4 +-
  migration/fd.c                |   4 +-
  migration/migration.c         | 216 ++++++++----
  migration/postcopy-ram.c      |   6 +-
  migration/qemu-file-buf.c     |  61 ++++
  migration/ram.c               | 213 ++++++++++--
  migration/rdma.c              |   2 +-
  migration/savevm.c            | 295 ++++++++++++----
  migration/tcp.c               |   4 +-
  migration/unix.c              |   4 +-
  net/filter-buffer.c           | 127 ++++++-
  net/filter.c                  |   6 +-
  net/net.c                     |  58 ++++
  qapi-schema.json              | 106 +++++-
  qapi/event.json               |  17 +
  qmp-commands.hx               |  24 +-
  stubs/Makefile.objs           |   1 +
  stubs/migration-colo.c        |  45 +++
  trace-events                  |  10 +
  vl.c                          |  37 +-
  37 files changed, 2152 insertions(+), 195 deletions(-)
  create mode 100644 include/migration/colo.h
  create mode 100644 include/migration/failover.h
  create mode 100644 migration/colo-comm.c
  create mode 100644 migration/colo-failover.c
  create mode 100644 migration/colo.c
  create mode 100644 stubs/migration-colo.c

--
1.8.3.1


--
Dr. David Alan Gilbert / address@hidden / Manchester, UK

.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]