[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Qemu-block] [PATCH v19 00/10] Block replication for co

From: Changlong Xie
Subject: Re: [Qemu-devel] [Qemu-block] [PATCH v19 00/10] Block replication for continuous checkpoints
Date: Tue, 31 May 2016 18:25:16 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0

On 05/31/2016 02:20 AM, Stefan Hajnoczi wrote:
On Fri, May 20, 2016 at 03:36:10PM +0800, Changlong Xie wrote:
Block replication is a very important feature which is used for
continuous checkpoints(for example: COLO).

You can get the detailed information about block replication from here:

Please refer to docs/block-replication.txt

You can get the patch here:

You can get the patch with framework here:

1. Continuous block replication. It will be started after basic functions
    are accepted.

Changs Log:
1. Rebase to v2.6.0
2. Address comments from stefan
p3: a new patch that export interfaces for extra serialization
1. call replication_stop() before freeing s->top_id
2. check top_bs
3. reopen file readonly in error return paths
4. enable extra serialization between read and COW
p9: try to hanlde SIGABRT
p6: add local_err in all replication callbacks to prevent "errp == NULL"
p7: add missing qemu_iovec_destroy(xxx)
1. Rebase to the lastest codes
p2: refactor backup_do_checkpoint addressed comments from Jeff Cody
p4: fix bugs in "drive_add buddy xxx" hmp commands
p6: add "since: 2.7"
p7: fix bug in replication_close(), add missing "qapi/error.h", add 
p8: add "since: 2.7"
1. Rebase to the newest codes
2. Address comments from Stefan & hailiang
p3: we don't need this patch now
p4: add "top-id" parameters for secondary
p6: fix NULL pointer in replication callbacks, remove unnecessary typedefs,
add doc comments that explain the semantics of Replication
p7: Refactor AioContext for thread-safe, remove unnecessary get_top_bs()
*Note*: I'm working on replication testcase now, will send out in V17
1. Rebase to the newest codes
2. Fix typos and coding style addresed Eric's comments
3. Address Stefan's comments
    1) Make backup_do_checkpoint public, drop the changes on BlockJobDriver
    2) Update the message and description for [PATCH 4/9]
    3) Make replication_(start/stop/do_checkpoint)_all as global interfaces
    4) Introduce AioContext lock to protect start/stop/do_checkpoint callbacks
    5) Use BdrvChild instead of holding on to BlockDriverState * pointers
4. Clear BDRV_O_INACTIVE for hidden disk's open_flags since commit 09e0c771
5. Introduce replication_get_error_all to check replication status
6. Remove useless discard interface
1. Implement auto complete active commit
2. Implement active commit block job for replication.c
3. Address the comments from Stefan, add replication-specific API and data
    structure, also remove old block layer APIs
1. Rebase to the newest codes
2. Remove redundant marcos and semicolon in replication.c
3. Fix typos in block-replication.txt
1. Rebase to the newest codes
2. Use backing reference to replcace 'allow-write-backing-file'
1. Reopen the backing file when starting blcok replication if it is not
    opened in R/W mode
    when opening backing file
3. Block the top BDS so there is only one block job for the top BDS and
    its backing chain.
1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
2. Address the comments from Eric Blake
1. Update the error messages
2. Rebase to the newest qemu
3. Split child add/delete support. These patches are sent in another patchset.
1. Address Alberto Garcia's comments
1. Implement adding/removing quorum child. Remove the option non-connect.
2. Simplify the backing refrence option according to Stefan Hajnoczi's 
1. Rebase to the newest qemu.
1. Address the comments from Gong Lei
2. Speed the failover up. The secondary vm can take over very quickly even
    if there are too many I/O requests.
1. Introduce a new driver replication to avoid touch nbd and qcow2.
1: use error_setg() instead of error_set()
2. Add a new block job API
3. Active disk, hidden disk and nbd target uses the same AioContext
4. Add a testcase to test new hbitmap API
1. Redesign the secondary qemu(use image-fleecing)
2. Use Error objects to return error message
3. Address the comments from Max Reitz and Eric Blake

Changlong Xie (3):
   Backup: export interfaces for extra serialization
   Introduce new APIs to do replication operation
   tests: add unit test case for replication

Wen Congyang (7):
   unblock backup operations in backing file
   Backup: clear all bitmap when doing block checkpoint
   Link backup into block core
   docs: block replication's description
   auto complete active commit
   Implement new driver for block replication
   support replication driver in blockdev-add

  Makefile.objs                |   1 +
  block.c                      |  17 ++
  block/Makefile.objs          |   3 +-
  block/backup.c               |  59 +++-
  block/mirror.c               |  13 +-
  block/replication.c          | 666 +++++++++++++++++++++++++++++++++++++++++++
  blockdev.c                   |   2 +-
  docs/block-replication.txt   | 239 ++++++++++++++++
  include/block/block_backup.h |  17 ++
  include/block/block_int.h    |   3 +-
  qapi/block-core.json         |  33 ++-
  qemu-img.c                   |   2 +-
  replication.c                | 105 +++++++
  replication.h                | 176 ++++++++++++
  tests/.gitignore             |   1 +
  tests/Makefile               |   4 +
  tests/test-replication.c     | 523 +++++++++++++++++++++++++++++++++
  17 files changed, 1847 insertions(+), 17 deletions(-)
  create mode 100644 block/replication.c
  create mode 100644 docs/block-replication.txt
  create mode 100644 include/block/block_backup.h
  create mode 100644 replication.c
  create mode 100644 replication.h
  create mode 100644 tests/test-replication.c

I have reviewed many revisions of this series.  The main mechanism in
this series makes sense to me.


I'm still concerned that checkpointing (vm_stop(), not in this series
but COLO in general) depends on bdrv_drain(), which can block forever if
I/O is hung.  That doesn't seem like a reasonable limitation for a high
availability feature since it may lead to the VM becoming unavailable.

IIRC, this issue is mentioned in my older patchset.

I'd like Jeff and/or Kevin to review this series and merge it once they
are happy.

@jeff, kevin and all block maintainters, would you like to review this patchset?


reply via email to

[Prev in Thread] Current Thread [Next in Thread]