qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [POC] colo-proxy in qemu


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] [POC] colo-proxy in qemu
Date: Thu, 30 Jul 2015 13:30:53 +0100
User-agent: Mutt/1.5.23 (2014-03-12)

* Gonglei (address@hidden) wrote:
> On 2015/7/30 19:56, Dr. David Alan Gilbert wrote:
> > * Jason Wang (address@hidden) wrote:
> >>
> >>
> >> On 07/30/2015 04:03 PM, Dr. David Alan Gilbert wrote:
> >>> * Dong, Eddie (address@hidden) wrote:
> >>>>>> A question here, the packet comparing may be very tricky. For example,
> >>>>>> some protocol use random data to generate unpredictable id or
> >>>>>> something else. One example is ipv6_select_ident() in Linux. So COLO
> >>>>>> needs a mechanism to make sure PVM and SVM can generate same random
> >>>>> data?
> >>>>> Good question, the random data connection is a big problem for COLO. At
> >>>>> present, it will trigger checkpoint processing because of the different 
> >>>>> random
> >>>>> data.
> >>>>> I don't think any mechanisms can assure two different machines generate 
> >>>>> the
> >>>>> same random data. If you have any ideas, pls tell us :)
> >>>>>
> >>>>> Frequent checkpoint can handle this scenario, but maybe will cause the
> >>>>> performance poor. :(
> >>>>>
> >>>> The assumption is that, after VM checkpoint, SVM and PVM have identical 
> >>>> internal state, so the pattern used to generate random data has high 
> >>>> possibility to generate identical data at short time, at least...
> >>> They do diverge pretty quickly though; I have simple examples which
> >>> reliably cause a checkpoint because of simple randomness in applications.
> >>>
> >>> Dave
> >>>
> >>
> >> And it will become even worse if hwrng is used in guest.
> > 
> > Yes; it seems quite application dependent;  (on IPv4) an ssh connection,
> > once established, tends to work well without triggering checkpoints;
> > and static web pages also work well.  Examples of things that do cause
> > more checkpoints are, displaying guest statistics (e.g. running top
> > in that ssh) which is timing dependent, and dynamically generated
> > web pages that include a unique ID (bugzilla's password reset link in
> > it's front page was a fun one), I think also establishing
> > new encrypted connections cause the same randomness.
> > 
> > However, it's worth remembering that COLO is trying to reduce the
> > number of checkpoints compared to a simple checkpointing world
> > which would be aiming to do a checkpoint ~100 times a second,
> > and for compute bound workloads, or ones that don't expose
> > the randomness that much, it can get checkpoints of a few seconds
> > in length which greatly reduces the overhead.
> > 
> 
> Yes. That's the truth.
> We can set two different modes for different scenarios. Maybe Named
> 1) frequent checkpoint mode for multi-connections and randomness scenarios
> and 2) non-frequent checkpoint mode for other scenarios.
> 
> But that's the next plan, we are thinking about that.

I have some code that tries to automatically switch between those;
it measures the checkpoint lengths, and if they're consistently short
it sends a different message byte to the secondary at the start of the
checkpoint, so that it doesn't bother running.   Every so often it
then flips back to a COLO checkpoint to see if the checkpoints
are still really fast.

Dave

> 
> Regards,
> -Gonglei
> 
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]