[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COL
From: |
zhanghailiang |
Subject: |
[Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) |
Date: |
Tue, 15 Dec 2015 16:22:21 +0800 |
This is the 12th version of COLO.
As usual, this version of COLO is only support periodic checkpoint,
just like MicroCheckpointing and Remus does.
Here is only COLO frame part, you can get the whole codes from github:
https://github.com/coloft/qemu/commits/colo-v2.3-periodic-mode
Test procedure:
1. Startup qemu
Primary side:
#x86_64-softmmu/qemu-system-x86_64 -enable-kvm -boot c -m 2048 -smp 2 -qmp
stdio -vnc :7 -name primary -cpu qemu64,+kvmclock -device piix3-usb-uhci
-device usb-tablet -netdev tap,id=hn0,vhost=off -device
virtio-net-pci,id=net-pci0,netdev=hn0 -drive
if=virtio,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0.file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,children.0.driver=raw
Secondary side:
#x86_64-softmmu/qemu-system-x86_64 -boot c -m 2048 -smp 2 -qmp stdio -vnc :7
-name secondary -enable-kvm -cpu qemu64,+kvmclock -device piix3-usb-uhci
-device usb-tablet -netdev tap,id=hn0,vhost=off -device
virtio-net-pci,id=net-pci0,netdev=hn0 -drive
if=none,id=colo-disk0,file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,driver=raw,node-name=node0
-drive
if=virtio,id=active-disk0,throttling.bps-total=70000000,driver=replication,mode=secondary,file.driver=qcow2,file.file.filename=/mnt/ramfs/active_disk.img,file.backing.driver=qcow2,file.backing.file.filename=/mnt/ramfs/hidden_disk.img,file.backing.backing=colo-disk0
-incoming tcp:0:8888
2. On Secondary VM's QEMU monitor, issue command
{'execute':'qmp_capabilities'}
{'execute': 'nbd-server-start', 'arguments': {'addr': {'type': 'inet', 'data':
{'host': '192.168.2.88', 'port': '8889'} } } }
{'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk0', 'writable':
true } }
{'execute': 'trace-event-set-state', 'arguments': {'name': 'colo*', 'enable':
true} }
3. On Primary VM's QEMU monitor, issue command:
{'execute':'qmp_capabilities'}
{'execute': 'human-monitor-command', 'arguments': {'command-line': 'drive_add
buddy
driver=replication,mode=primary,file.driver=nbd,file.host=9.61.1.7,file.port=8889,file.export=colo-disk0,node-name=node0,if=none'}}
{'execute':'x-blockdev-change', 'arguments':{'parent': 'colo-disk0', 'node':
'node0' } }
{'execute': 'migrate-set-capabilities', 'arguments': {'capabilities': [
{'capability': 'x-colo', 'state': true } ] } }
{'execute': 'migrate', 'arguments': {'uri': 'tcp:192.168.2.88:8888' } }
4. After the above steps, you will see, whenever you make changes to PVM, SVM
will be synced.
You can by issue command '{ "execute": "migrate-set-parameters" , "arguments":{
"x-checkpoint-delay": 2000 } }'
to change the checkpoint period time.
5. Failover test
You can kill Primary VM and run 'x_colo_lost_heartbeat' in Secondary VM's
monitor at the same time, then SVM will failover and client will not feel this
change.
Before issuing '{ "execute": "x-colo-lost-heartbeat" }' command, we have to
issue block related command to stop block replication.
Primary:
Remove the nbd child from the quorum:
{ 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk0',
'child': 'children.1'}}
Note: there is no qmp command to remove the blockdev now
Secondary:
The primary host is down, so we should do the following thing:
{ 'execute': 'nbd-server-stop' }
Please review, thanks.
TODO:
1. Implement packets compare module (proxy) in qemu (Doing)
2. Checkpoint based on proxy in qemu
3. The capability of continuous FT
v12:
- Fix the bug that default buffer filter broken vhost-net.
- Add an flag in struct NetFilterState to help skipping default
filter for packets travelling through filter layer.
- Remove the default failover treatment which may cause split-brain.
- Rename checkpoint-delay to x-checkpoint-delay.
- Check if all netdev supports default filter before going into COLO.
- Reconstruct send/receive helper functions in patch 10.
- Address serveral other comments from Dave
v11:
- Re-implement buffer/release packets based on filter-buffer according
to Jason Wang's suggestion. (patch 34, patch 36 ~ patch 38)
- Rebase master to re-use some stuff introduced by post-copy.
- Address several comments from Eric and Dave, the fixing record can
be found in each patch.
v10:
- Rename 'colo_lost_heartbeat' command to experimental 'x_colo_lost_heartbeat'
- Rename migration capability 'colo' to 'x-colo' (Eric's suggestion)
- Simplify the process of primary side by dropping colo thread and reusing
migration thread. (Dave's suggestion)
- Add several netfilter related APIs to support buffer/release packets
for COLO (patch 32 ~ patch 36)
zhanghailiang (38):
configure: Add parameter for configure to enable/disable COLO support
migration: Introduce capability 'x-colo' to migration
COLO: migrate colo related info to secondary node
migration: Export migrate_set_state()
migration: Add state records for migration incoming
migration: Integrate COLO checkpoint process into migration
migration: Integrate COLO checkpoint process into loadvm
migration: Rename the'file' member of MigrationState
COLO/migration: Create a new communication path from destination to
source
COLO: Implement colo checkpoint protocol
COLO: Add a new RunState RUN_STATE_COLO
QEMUSizedBuffer: Introduce two help functions for qsb
COLO: Save PVM state to secondary side when do checkpoint
ram: Split host_from_stream_offset() into two helper functions
COLO: Load PVM's dirty pages into SVM's RAM cache temporarily
ram/COLO: Record the dirty pages that SVM received
COLO: Load VMState into qsb before restore it
COLO: Flush PVM's cached RAM into SVM's memory
COLO: Add checkpoint-delay parameter for migrate-set-parameters
COLO: synchronize PVM's state to SVM periodically
COLO failover: Introduce a new command to trigger a failover
COLO failover: Introduce state to record failover process
COLO: Implement failover work for Primary VM
COLO: Implement failover work for Secondary VM
qmp event: Add event notification for COLO error
COLO failover: Shutdown related socket fd when do failover
COLO failover: Don't do failover during loading VM's state
COLO: Process shutdown command for VM in COLO state
COLO: Update the global runstate after going into colo state
savevm: Split load vm state function qemu_loadvm_state
COLO: Separate the process of saving/loading ram and device state
COLO: Split qemu_savevm_state_begin out of checkpoint process
net/filter-buffer: Add default filter-buffer for each netdev
filter-buffer: Accept zero interval
filter-buffer: Introduce a helper function to enable/disable default
filter
filter-buffer: Introduce a helper function to release packets
colo: Use default buffer-filter to buffer and release packets
COLO: Add block replication into colo process
configure | 11 +
docs/qmp-events.txt | 17 +
hmp-commands.hx | 15 +
hmp.c | 15 +
hmp.h | 1 +
include/exec/ram_addr.h | 9 +-
include/migration/colo.h | 38 +++
include/migration/failover.h | 33 ++
include/migration/migration.h | 18 +-
include/migration/qemu-file.h | 3 +-
include/net/filter.h | 12 +
include/net/net.h | 5 +
include/sysemu/sysemu.h | 9 +
migration/Makefile.objs | 2 +
migration/colo-comm.c | 71 ++++
migration/colo-failover.c | 83 +++++
migration/colo.c | 765 ++++++++++++++++++++++++++++++++++++++++++
migration/exec.c | 4 +-
migration/fd.c | 4 +-
migration/migration.c | 216 ++++++++----
migration/postcopy-ram.c | 6 +-
migration/qemu-file-buf.c | 61 ++++
migration/ram.c | 213 ++++++++++--
migration/rdma.c | 2 +-
migration/savevm.c | 295 ++++++++++++----
migration/tcp.c | 4 +-
migration/unix.c | 4 +-
net/filter-buffer.c | 127 ++++++-
net/filter.c | 6 +-
net/net.c | 58 ++++
qapi-schema.json | 106 +++++-
qapi/event.json | 17 +
qmp-commands.hx | 24 +-
stubs/Makefile.objs | 1 +
stubs/migration-colo.c | 45 +++
trace-events | 10 +
vl.c | 37 +-
37 files changed, 2152 insertions(+), 195 deletions(-)
create mode 100644 include/migration/colo.h
create mode 100644 include/migration/failover.h
create mode 100644 migration/colo-comm.c
create mode 100644 migration/colo-failover.c
create mode 100644 migration/colo.c
create mode 100644 stubs/migration-colo.c
--
1.8.3.1
- [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT),
zhanghailiang <=
- [Qemu-devel] [PATCH COLO-Frame v12 09/38] COLO/migration: Create a new communication path from destination to source, zhanghailiang, 2015/12/15
- [Qemu-devel] [PATCH COLO-Frame v12 01/38] configure: Add parameter for configure to enable/disable COLO support, zhanghailiang, 2015/12/15
- [Qemu-devel] [PATCH COLO-Frame v12 03/38] COLO: migrate colo related info to secondary node, zhanghailiang, 2015/12/15
- [Qemu-devel] [PATCH COLO-Frame v12 04/38] migration: Export migrate_set_state(), zhanghailiang, 2015/12/15
- [Qemu-devel] [PATCH COLO-Frame v12 07/38] migration: Integrate COLO checkpoint process into loadvm, zhanghailiang, 2015/12/15
- [Qemu-devel] [PATCH COLO-Frame v12 02/38] migration: Introduce capability 'x-colo' to migration, zhanghailiang, 2015/12/15
- [Qemu-devel] [PATCH COLO-Frame v12 06/38] migration: Integrate COLO checkpoint process into migration, zhanghailiang, 2015/12/15