[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistenc
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency |
Date: |
Thu, 11 Sep 2014 18:44:08 +0100 |
User-agent: |
Mutt/1.5.23 (2014-03-12) |
(I've cc'd in Fam, Stefan, and Kevin for Block stuff, and
Yang and Eddie for Colo)
* Walid Nouri (address@hidden) wrote:
> Hello Michael, Hello Paolo
> i have ???studied??? the available documentation/Information and tried to
> get an idea of the QEMU live block operation possibilities.
>
> I think the MC protocol doesn???t need synchronous block device replication
> because primary and secondary VM are not synchronous. The state of the
> primary is allays ahead of the state of the secondary. When the primary is
> in epoch(n) the secondary is in epoch(n-1).
>
> What MC needs is a block device agnostic, controlled and asynchronous
> approach for replicating the contents of block devices and its state changes
> to the secondary VM while the primary VM is running. Asynchronous block
> transfer is important to allow maximum performance for the primary VM, while
> keeping the secondary VM updated with state changes.
>
> The block device replication should be possible in two stages or modes.
>
> The first stage is the live copy of all block devices of the primary to the
> secondary. This is necessary if the secondary doesn???t have an existing
> image which is in sync with the primary at the time MC has started. This is
> not very convenient but as far as I know actually there is no mechanism for
> persistent dirty bitmap in QEMU.
>
> The second stage (mode) is the replication of block device state changes
> (modified blocks) to keep the image on the secondary in sync with the
> primary. The mirrored blocks must be buffered in ram (block buffer) until
> the complete Checkpoint (RAM, vCPU, device state) can be committed.
>
> For keeping the complete system state consistent on the secondary system
> there must be a possibility for MC to commit/discard block device state
> changes. In normal operation the mirrored block device state changes (block
> buffer) are committed to disk when the complete checkpoint is committed. In
> case of a crash of the primary system while transferring a checkpoint the
> data in the block buffer corresponding to the failed Checkpoint must be
> discarded.
I think for COLO there's a requirement that the secondary can do reads/writes
in parallel with the primary, and the secondary can discard those reads/writes
- and that doesn't happen in MC (Yang or Eddie should be able to confirm that).
> The storage architecture should be ???shared nothing??? so that no shared
> storage is required and primary/secondary can have separate block device
> images.
MC/COLO with shared storage still needs some stuff like this; but it's subtely
different. They still need to be able to buffer/release modifications
to the shared storage; if any of this code can also be used in the
shared-storage configurations it would be good.
> I think this can be achieved by drive-mirror and a filter block driver.
> Another approach could be to exploit the block migration functionality of
> live migration with a filter block driver.
>
> The drive-mirror (and live migration) does not rely on shared storage and
> allow live block device copy and incremental syncing.
>
> A block buffer can be implemented with a QEMU filter block driver. It should
> sit at the same position as the Quorum driver in the block driver hierarchy.
> When using block filter approach MC will be transparent and block device
> agnostic.
>
> The block buffer filter must have an Interface which allows MC control the
> commits or discards of block device state changes. I have no idea where to
> put such an interface to stay conform with QEMU coding style.
>
>
> I???m sure there are alternative and better approaches and I???m open for
> any ideas
>
>
> Walid
>
> Am 17.08.2014 11:52, schrieb Paolo Bonzini:
> >Il 11/08/2014 22:15, Michael R. Hines ha scritto:
> >>Excellent question: QEMU does have a feature called "drive-mirror"
> >>in block/mirror.c that was introduced a couple of years ago. I'm not
> >>sure what the
> >>adoption rate of the feature is, but I would start with that one.
> >
> >block/mirror.c is asynchronous, and there's no support for communicating
> >checkpoints back to the master. However, the quorum disk driver could
> >be what you need.
> >
> >There's also a series on the mailing list that lets quorum read only
> >from the primary, so that quorum can still do replication and fault
> >tolerance, but skip fault detection.
> >
> >Paolo
> >
> >>There is also a second fault tolerance implementation that works a
> >>little differently called
> >>"COLO" - you may have seen those emails on the list too, but their
> >>method does not require a disk replication solution, if I recall correctly.
> >
>
>
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK
- Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Walid Nouri, 2014/09/10
- Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Michael R. Hines, 2014/09/10
- Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Paolo Bonzini, 2014/09/11
- Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency,
Dr. David Alan Gilbert <=
- Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Walid Nouri, 2014/09/11
- Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Hongyang Yang, 2014/09/11
- Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Stefan Hajnoczi, 2014/09/12
- Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Walid Nouri, 2014/09/17
- Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Stefan Hajnoczi, 2014/09/18
- Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Walid Nouri, 2014/09/23
- Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Stefan Hajnoczi, 2014/09/24
- Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Walid Nouri, 2014/09/25