qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [POC] colo-proxy in qemu


From: Gonglei
Subject: Re: [Qemu-devel] [POC] colo-proxy in qemu
Date: Thu, 30 Jul 2015 20:10:25 +0800
User-agent: Mozilla/5.0 (Windows NT 6.1; rv:31.0) Gecko/20100101 Thunderbird/31.4.0

On 2015/7/30 19:56, Dr. David Alan Gilbert wrote:
> * Jason Wang (address@hidden) wrote:
>>
>>
>> On 07/30/2015 04:03 PM, Dr. David Alan Gilbert wrote:
>>> * Dong, Eddie (address@hidden) wrote:
>>>>>> A question here, the packet comparing may be very tricky. For example,
>>>>>> some protocol use random data to generate unpredictable id or
>>>>>> something else. One example is ipv6_select_ident() in Linux. So COLO
>>>>>> needs a mechanism to make sure PVM and SVM can generate same random
>>>>> data?
>>>>> Good question, the random data connection is a big problem for COLO. At
>>>>> present, it will trigger checkpoint processing because of the different 
>>>>> random
>>>>> data.
>>>>> I don't think any mechanisms can assure two different machines generate 
>>>>> the
>>>>> same random data. If you have any ideas, pls tell us :)
>>>>>
>>>>> Frequent checkpoint can handle this scenario, but maybe will cause the
>>>>> performance poor. :(
>>>>>
>>>> The assumption is that, after VM checkpoint, SVM and PVM have identical 
>>>> internal state, so the pattern used to generate random data has high 
>>>> possibility to generate identical data at short time, at least...
>>> They do diverge pretty quickly though; I have simple examples which
>>> reliably cause a checkpoint because of simple randomness in applications.
>>>
>>> Dave
>>>
>>
>> And it will become even worse if hwrng is used in guest.
> 
> Yes; it seems quite application dependent;  (on IPv4) an ssh connection,
> once established, tends to work well without triggering checkpoints;
> and static web pages also work well.  Examples of things that do cause
> more checkpoints are, displaying guest statistics (e.g. running top
> in that ssh) which is timing dependent, and dynamically generated
> web pages that include a unique ID (bugzilla's password reset link in
> it's front page was a fun one), I think also establishing
> new encrypted connections cause the same randomness.
> 
> However, it's worth remembering that COLO is trying to reduce the
> number of checkpoints compared to a simple checkpointing world
> which would be aiming to do a checkpoint ~100 times a second,
> and for compute bound workloads, or ones that don't expose
> the randomness that much, it can get checkpoints of a few seconds
> in length which greatly reduces the overhead.
> 

Yes. That's the truth.
We can set two different modes for different scenarios. Maybe Named
1) frequent checkpoint mode for multi-connections and randomness scenarios
and 2) non-frequent checkpoint mode for other scenarios.

But that's the next plan, we are thinking about that.

Regards,
-Gonglei




reply via email to

[Prev in Thread] Current Thread [Next in Thread]