[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] [PATCH COLO-Frame v14 00/40] COarse-grain LOck-stepping(COL
From: |
zhanghailiang |
Subject: |
[Qemu-devel] [PATCH COLO-Frame v14 00/40] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) |
Date: |
Sat, 6 Feb 2016 17:28:12 +0800 |
This is the 14th version of COLO (Still only support periodic checkpoint).
Here is only COLO frame part, you can get the whole codes from github:
https://github.com/coloft/qemu/commits/colo-v2.5-periodic-mode
There are little changes for this series except the network releated part.
We have re-implement this part according to Jason's suggestion. Most of other
parts have been reviewed by Dave.
QEMU has approached soft-freeze for 2.6, we hope COLO prototype to be merged
in 2.6, but we are not sure if we have enough time to catch this train.
So please help us, thanks very much.
Test procedure:
1. Startup qemu
Primary side:
#x86_64-softmmu/qemu-system-x86_64 -enable-kvm -boot c -m 2048 -smp 2 -qmp
stdio -vnc :7 -name primary -cpu qemu64,+kvmclock -device piix3-usb-uhci
-device usb-tablet -netdev tap,id=hn0,vhost=off -device
virtio-net-pci,id=net-pci0,netdev=hn0 -drive
if=virtio,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0.file.filename=/mnt/sdd/rhel_6.5_64_2U_ide,children.0.driver=raw
Secondary side:
#x86_64-softmmu/qemu-system-x86_64 -boot c -m 2048 -smp 2 -qmp stdio -vnc :7
-name secondary -enable-kvm -cpu qemu64,+kvmclock -device piix3-usb-uhci
-device usb-tablet -netdev tap,id=hn0,vhost=off -device
virtio-net-pci,id=net-pci0,netdev=hn0 -drive
if=none,id=colo-disk0,file.filename=/mnt/sdd/rhel_6.5_64_2U_ide,driver=raw,node-name=node0
-drive
if=virtio,id=active-disk0,throttling.bps-total=70000000,driver=replication,mode=secondary,file.driver=qcow2,file.file.filename=/mnt/ramfs/active_disk.img,file.backing.driver=qcow2,file.backing.file.filename=/mnt/ramfs/hidden_disk.img,file.backing.backing=colo-disk0
-incoming tcp:0:8888
2. On Secondary VM's QEMU monitor, issue command
{'execute':'qmp_capabilities'}
{'execute': 'nbd-server-start', 'arguments': {'addr': {'type': 'inet', 'data':
{'host': '192.168.2.88', 'port': '8889'} } } }
{'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk0', 'writable':
true } }
{'execute': 'trace-event-set-state', 'arguments': {'name': 'colo*', 'enable':
true} }
3. On Primary VM's QEMU monitor, issue command:
{'execute':'qmp_capabilities'}
{'execute': 'human-monitor-command', 'arguments': {'command-line': 'drive_add
buddy
driver=replication,mode=primary,file.driver=nbd,file.host=9.61.1.7,file.port=8889,file.export=colo-disk0,node-name=node0,if=none,id=blk-buddy0'}}
{'execute':'x-blockdev-change', 'arguments':{'parent': 'colo-disk0', 'node':
'node0' } }
{'execute': 'migrate-set-capabilities', 'arguments': {'capabilities': [
{'capability': 'x-colo', 'state': true } ] } }
{'execute': 'migrate', 'arguments': {'uri': 'tcp:192.168.2.88:8888' } }
4. After the above steps, you will see, whenever you make changes to PVM, SVM
will be synced.
You can by issue command '{ "execute": "migrate-set-parameters" , "arguments":{
"x-checkpoint-delay": 2000 } }'
to change the checkpoint period time.
5. Failover test
You can kill Primary VM and run 'x_colo_lost_heartbeat' in Secondary VM's
monitor at the same time, then SVM will failover and client will not feel this
change.
Before issuing '{ "execute": "x-colo-lost-heartbeat" }' command, we have to
issue block related command to stop block replication.
Primary:
Remove the nbd child from the quorum:
{ 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk0',
'child': 'children.1'}}
{ 'execute': 'human-monitor-command','arguments': {'command-line': 'drive_del
blk-buddy0'}}
Note: there is no qmp command to remove the blockdev now
Secondary:
The primary host is down, so we should do the following thing:
{ 'execute': 'nbd-server-stop' }
TODO:
1. Checkpoint based on proxy in qemu
2. The capability of continuous FT
3. Optimize the VM's downtime during checkpoint
v14:
- Re-implement the network processing based on netfilter (Jason Wang)
- Rename 'COLOCommand' to 'COLOMessage'. (Markus's suggestion)
- Split two new patches (patch 27/28) from patch 29
- Fix some other comments from Dave and Markus.
v13:
- Refactor colo_*_cmd helper functions to use 'Error **errp' parameter
instead of return value to indicate success or failure. (patch 10)
- Remove the optional error message for COLO_EXIT event. (patch 25)
- Use semaphore to notify colo/colo incoming loop that failover work is
finished. (patch 26)
- Move COLO shutdown related codes to colo.c file. (patch 28)
- Fix memory leak bug for colo incoming loop. (new patch 31)
- Re-use some existed helper functions to realize the process of
saving/loading ram and device. (patch 32)
- Fix some other comments from Dave and Markus.
zhanghailiang (40):
configure: Add parameter for configure to enable/disable COLO support
migration: Introduce capability 'x-colo' to migration
COLO: migrate colo related info to secondary node
migration: Integrate COLO checkpoint process into migration
migration: Integrate COLO checkpoint process into loadvm
COLO/migration: Create a new communication path from destination to
source
COLO: Implement colo checkpoint protocol
COLO: Add a new RunState RUN_STATE_COLO
QEMUSizedBuffer: Introduce two help functions for qsb
COLO: Save PVM state to secondary side when do checkpoint
COLO: Load PVM's dirty pages into SVM's RAM cache temporarily
ram/COLO: Record the dirty pages that SVM received
COLO: Load VMState into qsb before restore it
COLO: Flush PVM's cached RAM into SVM's memory
COLO: Add checkpoint-delay parameter for migrate-set-parameters
COLO: synchronize PVM's state to SVM periodically
COLO failover: Introduce a new command to trigger a failover
COLO failover: Introduce state to record failover process
COLO: Implement failover work for Primary VM
COLO: Implement failover work for Secondary VM
qmp event: Add COLO_EXIT event to notify users while exited from COLO
COLO failover: Shutdown related socket fd when do failover
COLO failover: Don't do failover during loading VM's state
COLO: Process shutdown command for VM in COLO state
COLO: Update the global runstate after going into colo state
savevm: Introduce two helper functions for save/find loadvm_handlers
entry
migration/savevm: Add new helpers to process the different stages of
loadvm
migration/savevm: Export two helper functions for savevm process
COLO: Separate the process of saving/loading ram and device state
COLO: Split qemu_savevm_state_begin out of checkpoint process
net/filter: Add a 'status' property for filter object
net/filter: Introduce a helper to add a filter to the netdev
filter-buffer: Accept zero interval
net: Add notifier/callback for netdev init
COLO/filter: add each netdev a buffer filter
net/filter: Add a helper to traverse all the filters
COLO: enable buffer filters for PVM
filter-buffer: make filter_buffer_flush() public
COLO: flush buffered packets in checkpoint process or exit COLO
COLO: Add block replication into colo process
configure | 11 +
docs/qmp-events.txt | 16 +
hmp-commands.hx | 15 +
hmp.c | 15 +
hmp.h | 1 +
include/exec/ram_addr.h | 1 +
include/migration/colo.h | 42 +++
include/migration/failover.h | 33 ++
include/migration/migration.h | 16 +
include/migration/qemu-file.h | 3 +-
include/net/filter.h | 12 +
include/net/net.h | 8 +
include/sysemu/sysemu.h | 9 +
migration/Makefile.objs | 2 +
migration/colo-comm.c | 76 ++++
migration/colo-failover.c | 83 +++++
migration/colo.c | 846 ++++++++++++++++++++++++++++++++++++++++++
migration/migration.c | 109 +++++-
migration/qemu-file-buf.c | 61 +++
migration/ram.c | 175 ++++++++-
migration/savevm.c | 114 ++++--
net/filter-buffer.c | 14 +-
net/filter.c | 79 ++++
net/net.c | 57 +++
qapi-schema.json | 104 +++++-
qapi/event.json | 15 +
qmp-commands.hx | 23 +-
stubs/Makefile.objs | 1 +
stubs/migration-colo.c | 54 +++
trace-events | 8 +
vl.c | 31 +-
31 files changed, 1959 insertions(+), 75 deletions(-)
create mode 100644 include/migration/colo.h
create mode 100644 include/migration/failover.h
create mode 100644 migration/colo-comm.c
create mode 100644 migration/colo-failover.c
create mode 100644 migration/colo.c
create mode 100644 stubs/migration-colo.c
--
1.8.3.1
- [Qemu-devel] [PATCH COLO-Frame v14 00/40] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT),
zhanghailiang <=
- [Qemu-devel] [PATCH COLO-Frame v14 01/40] configure: Add parameter for configure to enable/disable COLO support, zhanghailiang, 2016/02/06
- [Qemu-devel] [PATCH COLO-Frame v14 09/40] QEMUSizedBuffer: Introduce two help functions for qsb, zhanghailiang, 2016/02/06
- [Qemu-devel] [PATCH COLO-Frame v14 04/40] migration: Integrate COLO checkpoint process into migration, zhanghailiang, 2016/02/06
- [Qemu-devel] [PATCH COLO-Frame v14 02/40] migration: Introduce capability 'x-colo' to migration, zhanghailiang, 2016/02/06
- [Qemu-devel] [PATCH COLO-Frame v14 15/40] COLO: Add checkpoint-delay parameter for migrate-set-parameters, zhanghailiang, 2016/02/06
- [Qemu-devel] [PATCH COLO-Frame v14 18/40] COLO failover: Introduce state to record failover process, zhanghailiang, 2016/02/06
- [Qemu-devel] [PATCH COLO-Frame v14 05/40] migration: Integrate COLO checkpoint process into loadvm, zhanghailiang, 2016/02/06
- [Qemu-devel] [PATCH COLO-Frame v14 10/40] COLO: Save PVM state to secondary side when do checkpoint, zhanghailiang, 2016/02/06
- [Qemu-devel] [PATCH COLO-Frame v14 20/40] COLO: Implement failover work for Secondary VM, zhanghailiang, 2016/02/06
- [Qemu-devel] [PATCH COLO-Frame v14 06/40] COLO/migration: Create a new communication path from destination to source, zhanghailiang, 2016/02/06