[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [Qemu-devel] [PATCH v4 00/18] bitmaps: introduce 'bitma

From: Fabian Grünbichler
Subject: Re: [Qemu-block] [Qemu-devel] [PATCH v4 00/18] bitmaps: introduce 'bitmap' sync mode
Date: Tue, 23 Jul 2019 11:47:02 +0200
User-agent: NeoMutt/20180716

On Mon, Jul 22, 2019 at 01:21:02PM -0400, John Snow wrote:
> On 7/22/19 8:17 AM, Fabian Grünbichler wrote:
> > On Tue, Jul 09, 2019 at 07:25:32PM -0400, John Snow wrote:
> >> This series adds a new "BITMAP" sync mode that is meant to replace the
> >> existing "INCREMENTAL" sync mode.
> >>
> >> This mode can have its behavior modified by issuing any of three bitmap 
> >> sync
> >> modes, passed as arguments to the job.
> >>
> >> The three bitmap sync modes are:
> >> - ON-SUCCESS: This is an alias for the old incremental mode. The bitmap is
> >>               conditionally synchronized based on the return code of the 
> >> job
> >>               upon completion.
> >> - NEVER: This is, effectively, the differential backup mode. It never 
> >> clears
> >>          the bitmap, as the name suggests.
> >> - ALWAYS: Here is the new, exciting thing. The bitmap is always 
> >> synchronized,
> >>           even on failure. On success, this is identical to incremental, 
> >> but
> >>           on failure it clears only the bits that were copied successfully.
> >>           This can be used to "resume" incremental backups from later 
> >> points
> >>           in times.
> >>
> >> I wrote this series by accident on my way to implement incremental mode
> >> for mirror, but this happened first -- the problem is that Mirror mode
> >> uses its existing modes in a very particular way; and this was the best
> >> way to add bitmap support into the mirror job properly.
> >>
> >> [...]
> >>
> >> Future work:
> >> [..]
> >>  - Add these modes to Mirror. (Done*, but needs tests.)
> > 
> > are these mirror patches available somehwere for testing in combination
> > with this series? your bitmaps branch does not seem to contain them ;)
> > 
> > we've been experimenting with Ma Haocong's patch (v4 from February) to add
> > "incremental"/differential sync to drive-mirror recently with positive
> > results so far, and this sounds like it is another attempt at getting
> > this properly integrated into Qemu.
> > 
> Not available quite yet; I added it in fairly hastily but haven't done
> the testing I want to do yet, so I wouldn't feel comfortable sharing it
> before I do my own due diligence on it. Give me a chance to polish it so
> that the testing effort isn't wasted :)

fair enough, and no hurries :)

> Can you share some of your use-cases for how you are using the
> "incremental mirror" so far? It might be useful for the patch
> justification if I can point to production use cases. (And good for
> allocating time, too.)

it's basically the same use case that the original "incremental mirror"
patch (series)[1] from two years ago had (no affiliation with the author
though) - we have a guest disk replication feature for ZFS/zvols in a
clustered hypervisor setting, and would like to re-use the already
replicated disk state when live-migrating a VM. Qemu does not know
anything about the replication, since it happens on the storage layer
with zfs send/zfs receive. note that for VMs, we use zvols which are
block devices backed by ZFS (or rather, ZFS datasets exposed as block
devices), minus the file system part of regular ZFS datasets. from
Qemu's PoV these (replicated) disks are just regular block devices (and not
image-backed disks on a filesystem, or accessed via some special
BlockDriver like Ceph's RBD images).

we currently support live migration
1) with disks on shared/distributed storage (easy enough)
2) with regular (non-replicated, local) disks (via nbd/drive-mirror)
3) with unused disks on the storage level (disks are not known to Qemu/the VM)

1-3 can be mixed and matched arbitrarily in one guest, e.g. with one
disk on a shared Ceph cluster, one disk that is not in use on an NFS
share, and another disk on a local LVM-thin pool. 2) and 3) also allow
switching the underlying storage on the fly, since they transfer the
full disk (content) anyway.

we also support offline migration with shared, local, unused and/or
replicated disks (all on the storage level with no involvement of Qemu).

as you can see there is a gap in the live-migration feature matrix: when
replication is used, you either have to poweroff the VM to re-use the
replication state (storage-only migration), or drop the replication
state and do a full local-disk live-migration before re-creating the
replication state from scratch (which is bad, since replication can have
multiple target hosts, and re-establishing the whole disk can take a
while if its big).

our basic approach is (currently) the following:

1) get disk info
2) Qemu: add dirty bitmaps for currently used, replicated disks
3) storage/ZFS: do a regular replication of all replicated disks (used AND 
4) storage: do a regular storage migration of all regular unused local disks
5a) Qemu: do a regular drive-mirror of all currently used, local disks
5b) Qemu: do an incremental drive-mirror for all currently used, replicated 
6) Qemu: wait for convergence of drive-mirror jobs
7) Qemu: do a regular live-migration of VM
8) Qemu: once converged and VM is suspended, complete drive-mirror jobs
9) Qemu: resume now fully migrated VM on target node
10) Qemu/storage: clean up on source node

5b) with bitmaps from 2) is what is currently missing on the Qemu side,
but seems easy enough to support (like I said, we are currently using Ma
Haocong's patch for testing, but want to get this feature upstream one
way or another instead of carrying our own, possibly incompatible in the
near-future version).

2) and 3) are obviously not atomic, so the bitmaps will contain some
writes that have been replicated already on the block/storage layer
below the VM, and those writes will be done a second time in step 5b).

we can work around this by adding another short down time by
freezing/suspending prior to 2) until after doing the ZFS snapshots at
the start of 3), in case these duplicate writes turn out to be
problematic after all. this downtime would be rather short, as the bulk
of the replication work (actually transfering the latest delta) can
happen after unfreezing/resuming the VM. so far we haven't encountered
any problems in our (albeit limited) testing though, so if possible we
would naturally like to avoid the additional downtime altogether ;)

looking forward to your patch(es) :)

1: <address@hidden>
and <address@hidden>

reply via email to

[Prev in Thread] Current Thread [Next in Thread]