qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistenc


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency
Date: Thu, 14 Aug 2014 11:58:03 +0100
User-agent: Mutt/1.5.23 (2014-03-12)

cc'ing in a couple of the COLOers.

* Michael R. Hines (address@hidden) wrote:
> On 08/13/2014 10:03 PM, Walid Nouri wrote:
> >
> >While looking to find some ideas for approaches to replicating block
> >devices I have read the paper about the Remus implementation. I think MC
> >can take a similar approach for local disk.
> >
> 
> I agree.
> 
> >Here are the main facts that I have understood:
> >
> >Local disk contents is viewed as internal state the primary and secondary.
> >In the explanation they describe that for keeping disc semantics of the
> >primary and to allow the primary to run speculatively all disc state
> >changes are directly written to the disk. In parrallel and asynchronously
> >send to the secondary. The secondary keeps the pending writing requests in
> >two disk buffers. A speculation-disk-buffer and a write-out-buffer.
> >
> >After the reception of the next checkpoint the secondary copies the
> >speculation buffer to the write out buffer, commits the checkpoint and
> >applies the write out buffer to its local disk.
> >
> >When the primary fails the secondary must wait until write-out-buffer has
> >been completely written to disk before before changing the execution mode
> >to run as primary. In this case (failure of primary) the secondary
> >discards pending disk writes in its speculation buffer. This protocol
> >keeps the disc state consistent with the last checkpoint.
> >
> >Remus uses the XEN specific blktap driver. As far as I know this can?t be
> >used with QEMU (KVM).
> >
> >I must see how drive-mirror can be used for this kind of protocol.
> >
> 
> That's all correct. Theoretically, we would do exactly the same thing:
> drive-mirror on the source would write immediately to disk but follow the
> same commit semantics on the destination as Xen.
> 
> >
> >I have taken a look at COLO.
> >
> 
> >IMHO there are two points. Custom changes of the TCP-Stack are a no-go for
> >proprietary operating systems like Windows. It makes COLO application
> >agnostic but not operating system agnostic. The other point is that with
> >I/O intensive workloads COLO will tend to behave like MC. This is my point
> >of view but i didn?t invest much time to understand everything in detail.
> >
> 
> Actually, if I remember correctly, the TCP stack is only modified at the
> hypervisor level - they are intercepting and translating TCP sequence
> numbers "in-flight" to detect divergence of the source and destination -
> which is not a big problem if the implementation is well-done.

The 2013 paper says:
   'COLO modifies the guest OS’s TCP/IP stack in order to make the behavior
    more deterministic. '
but does say that an alternative might be to have a
  ' comparison function that operates transparently over re-assembled TCP 
streams'

> My hope in the future was that the two approaches could be used in a
> "Hybrid" manner - actually MC has much more of a performance hit for I/O
> than COLO does because of its buffering requirements.
> 
> On the other hand, MC would perform better in a memory-intensive or
> CPU-intensive situation - so maybe QEMU could "switch" between the two
> mechanisms at different points in time when the resource bottleneck changes.

If the primary were to rate-limit the number of resynchronisations
(and send the secondary a message as soon as it knew a resync was needed) that
would get some of the way, but then the only difference from microcheckpointing
at that point is the secondary doing a wasteful copy and sending the packets 
across;
it seems it should be easy to disable those if it knew that a resync was going 
to
happen.

Dave

> - Michael
> 
> 
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]