qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v14 7/8] Implement new driver for block replicat


From: Changlong Xie
Subject: Re: [Qemu-devel] [PATCH v14 7/8] Implement new driver for block replication
Date: Fri, 29 Jan 2016 11:13:42 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0

On 01/28/2016 11:15 PM, Stefan Hajnoczi wrote:
On Thu, Jan 28, 2016 at 09:13:24AM +0800, Wen Congyang wrote:
On 01/27/2016 10:46 PM, Stefan Hajnoczi wrote:
On Wed, Jan 13, 2016 at 05:18:31PM +0800, Changlong Xie wrote:
+static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp)
+{
+    Error *local_err = NULL;
+    int ret;
+
+    if (!s->secondary_disk->job) {
+        error_setg(errp, "Backup job is cancelled unexpectedly");
+        return;
+    }
+
+    block_job_do_checkpoint(s->secondary_disk->job, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    ret = s->active_disk->drv->bdrv_make_empty(s->active_disk);

What happens to in-flight requests to the active and hidden disks?

we MUST call do_checkpoint() when the vm is stopped.

Please document the environment under which the block replication
callback functions run.

OK


I'm concerned that the bdrv_drain_all() in vm_stop() can take a long
time if the disk is slow/failing.  bdrv_drain_all() blocks until all
in-flight I/O requests have completed.  What does the Primary do if the
Secondary becomes unresponsive?

Actually, we knew this problem. But currently, there seems no better way to resolve it. If you have any ideas?


+    switch (s->mode) {
+    case REPLICATION_MODE_PRIMARY:
+        break;
+    case REPLICATION_MODE_SECONDARY:
+        s->active_disk = bs->file->bs;
+        if (!bs->file->bs->backing) {
+            error_setg(errp, "Active disk doesn't have backing file");
+            return;
+        }
+
+        s->hidden_disk = s->active_disk->backing->bs;
+        if (!s->hidden_disk->backing) {
+            error_setg(errp, "Hidden disk doesn't have backing file");
+            return;
+        }
+
+        s->secondary_disk = s->hidden_disk->backing->bs;
+        if (!s->secondary_disk->blk) {
+            error_setg(errp, "The secondary disk doesn't have block backend");
+            return;
+        }

Kevin: Is code allowed to stash away BlockDriverState pointers for
convenience or should it keep the BdrvChild pointers instead?  In order
for replication to work as expected, the graph shouldn't change but for
consistency maybe BdrvChild is best.

I asked Kevin about this on IRC and he agreed that BdrvChild should be
used instead of holding on to BlockDriverState * pointers.  Although
these pointers will not change during replication (if the op blockers
are set up correctly), it's more consistent and certainly safer to go
through BdrvChild.

Ok


+        /* start backup job now */
+        error_setg(&s->blocker,
+                   "block device is in use by internal backup job");
+        bdrv_op_block_all(s->top_bs, s->blocker);
+        bdrv_op_unblock(s->top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);
+        bdrv_ref(s->hidden_disk);

Why is the explicit reference to hidden_disk (but not secondary_disk or
active_disk) is necessary?

IIRC, we should reference the backup target before calling backup_start(),
and we will reference the backup source in backup_start().

I'm not sure why this is necessary since they are part of the backing
chain.


Just as Wen said, we should reference the backup target before calling backup_start() to protect it from destroying, if backup job is stopped unexpectedly.

If it is necessary, please add a comment so it's clear why the reference
is being taken.


Ok

Stefan






reply via email to

[Prev in Thread] Current Thread [Next in Thread]