|
From: | Changlong Xie |
Subject: | Re: [Qemu-devel] [PATCH v14 7/8] Implement new driver for block replication |
Date: | Fri, 29 Jan 2016 11:13:42 +0800 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 |
On 01/28/2016 11:15 PM, Stefan Hajnoczi wrote:
On Thu, Jan 28, 2016 at 09:13:24AM +0800, Wen Congyang wrote:On 01/27/2016 10:46 PM, Stefan Hajnoczi wrote:On Wed, Jan 13, 2016 at 05:18:31PM +0800, Changlong Xie wrote:+static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp) +{ + Error *local_err = NULL; + int ret; + + if (!s->secondary_disk->job) { + error_setg(errp, "Backup job is cancelled unexpectedly"); + return; + } + + block_job_do_checkpoint(s->secondary_disk->job, &local_err); + if (local_err) { + error_propagate(errp, local_err); + return; + } + + ret = s->active_disk->drv->bdrv_make_empty(s->active_disk);What happens to in-flight requests to the active and hidden disks?we MUST call do_checkpoint() when the vm is stopped.Please document the environment under which the block replication callback functions run.
OK
I'm concerned that the bdrv_drain_all() in vm_stop() can take a long time if the disk is slow/failing. bdrv_drain_all() blocks until all in-flight I/O requests have completed. What does the Primary do if the Secondary becomes unresponsive?
Actually, we knew this problem. But currently, there seems no better way to resolve it. If you have any ideas?
+ switch (s->mode) { + case REPLICATION_MODE_PRIMARY: + break; + case REPLICATION_MODE_SECONDARY: + s->active_disk = bs->file->bs; + if (!bs->file->bs->backing) { + error_setg(errp, "Active disk doesn't have backing file"); + return; + } + + s->hidden_disk = s->active_disk->backing->bs; + if (!s->hidden_disk->backing) { + error_setg(errp, "Hidden disk doesn't have backing file"); + return; + } + + s->secondary_disk = s->hidden_disk->backing->bs; + if (!s->secondary_disk->blk) { + error_setg(errp, "The secondary disk doesn't have block backend"); + return; + }Kevin: Is code allowed to stash away BlockDriverState pointers for convenience or should it keep the BdrvChild pointers instead? In order for replication to work as expected, the graph shouldn't change but for consistency maybe BdrvChild is best.I asked Kevin about this on IRC and he agreed that BdrvChild should be used instead of holding on to BlockDriverState * pointers. Although these pointers will not change during replication (if the op blockers are set up correctly), it's more consistent and certainly safer to go through BdrvChild.
Ok
+ /* start backup job now */ + error_setg(&s->blocker, + "block device is in use by internal backup job"); + bdrv_op_block_all(s->top_bs, s->blocker); + bdrv_op_unblock(s->top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker); + bdrv_ref(s->hidden_disk);Why is the explicit reference to hidden_disk (but not secondary_disk or active_disk) is necessary?IIRC, we should reference the backup target before calling backup_start(), and we will reference the backup source in backup_start().I'm not sure why this is necessary since they are part of the backing chain.
Just as Wen said, we should reference the backup target before calling backup_start() to protect it from destroying, if backup job is stopped unexpectedly.
If it is necessary, please add a comment so it's clear why the reference is being taken.
Ok
Stefan
[Prev in Thread] | Current Thread | [Next in Thread] |