Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistenc

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistenc

From:	Michael R. Hines
Subject:	Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency
Date:	Thu, 11 Sep 2014 09:50:26 +0800
User-agent:	Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Thunderbird/24.6.0

On 09/10/2014 11:43 PM, Walid Nouri wrote:

Hello Michael, Hello Paolo
i have „studied“ the available documentation/Information and tried toget an idea of the QEMU live block operation possibilities.
I think the MC protocol doesn’t need synchronous block devicereplication because primary and secondary VM are not synchronous. Thestate of the primary is allays ahead of the state of the secondary.When the primary is in epoch(n) the secondary is in epoch(n-1).
What MC needs is a block device agnostic, controlled and asynchronousapproach for replicating the contents of block devices and its statechanges to the secondary VM while the primary VM is running.Asynchronous block transfer is important to allow maximum performancefor the primary VM, while keeping the secondary VM updated with statechanges.
The block device replication should be possible in two stages or modes.
The first stage is the live copy of all block devices of the primaryto the secondary. This is necessary if the secondary doesn’t have anexisting image which is in sync with the primary at the time MC hasstarted. This is not very convenient but as far as I know actuallythere is no mechanism for persistent dirty bitmap in QEMU.
The second stage (mode) is the replication of block device statechanges (modified blocks) to keep the image on the secondary in syncwith the primary. The mirrored blocks must be buffered in ram (blockbuffer) until the complete Checkpoint (RAM, vCPU, device state) can becommitted.
For keeping the complete system state consistent on the secondarysystem there must be a possibility for MC to commit/discard blockdevice state changes. In normal operation the mirrored block devicestate changes (block buffer) are committed to disk when the completecheckpoint is committed. In case of a crash of the primary systemwhile transferring a checkpoint the data in the block buffercorresponding to the failed Checkpoint must be discarded.
The storage architecture should be “shared nothing” so that no sharedstorage is required and primary/secondary can have separate blockdevice images.
I think this can be achieved by drive-mirror and a filter blockdriver. Another approach could be to exploit the block migrationfunctionality of live migration with a filter block driver.
The drive-mirror (and live migration) does not rely on shared storageand allow live block device copy and incremental syncing.
A block buffer can be implemented with a QEMU filter block driver. Itshould sit at the same position as the Quorum driver in the blockdriver hierarchy. When using block filter approach MC will betransparent and block device agnostic.
The block buffer filter must have an Interface which allows MC controlthe commits or discards of block device state changes. I have no ideawhere to put such an interface to stay conform with QEMU coding style.
I’m sure there are alternative and better approaches and I’m open forany ideas
Walid

Am 17.08.2014 11:52, schrieb Paolo Bonzini:
Il 11/08/2014 22:15, Michael R. Hines ha scritto:
Excellent question: QEMU does have a feature called "drive-mirror"
in block/mirror.c that was introduced a couple of years ago. I'm not
sure what the
adoption rate of the feature is, but I would start with that one.
block/mirror.c is asynchronous, and there's no support for communicating
checkpoints back to the master. However, the quorum disk driver could
be what you need.

There's also a series on the mailing list that lets quorum read only
from the primary, so that quorum can still do replication and fault
tolerance, but skip fault detection.

Paolo
There is also a second fault tolerance implementation that works a
little differently called
"COLO" - you may have seen those emails on the list too, but their
method does not require a disk replication solution, if I recallcorrectly.

Nice description of the problem - would you like to put this informationon the MC wiki page? (Just send an email to the list that says "requestfor wiki account, please" in the subject - and they will make an accountfor you.


A drive-mirror + filter driver solution sounds like a good plan overall,
of course the devil is in the details =)

I don't know how much time you have to spend on actual code, but even adescription of what a "theoretical" interface between MC anddrive-mirror would look like would go a long way even without code.

Your investigations would also help "drive" a solution to this problemfor the COLO team as well - I believe they need the same thing....


- Michael

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Walid Nouri, 2014/09/10
- Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Michael R. Hines <=
  - Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Hongyang Yang, 2014/09/11
- Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Paolo Bonzini, 2014/09/11
- Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Dr. David Alan Gilbert, 2014/09/11
  - Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Walid Nouri, 2014/09/11
  - Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Hongyang Yang, 2014/09/11
  - Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Stefan Hajnoczi, 2014/09/12
    - Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Walid Nouri, 2014/09/17
    - Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Stefan Hajnoczi, 2014/09/18
    - Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Walid Nouri, 2014/09/23
    - Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Stefan Hajnoczi, 2014/09/24

Prev by Date: Re: [Qemu-devel] [PATCH 3/3] xenfb: Add "feature-no-abs-rescale" for Windows PV frontend
Next by Date: Re: [Qemu-devel] ballooning not working on hotplugged pc-dimm
Previous by thread: Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency
Next by thread: Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency
Index(es):
- Date
- Thread