qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH COLO-Frame (Base) v21 00/17] COarse-grain LOck-s


From: Hailiang Zhang
Subject: Re: [Qemu-devel] [PATCH COLO-Frame (Base) v21 00/17] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
Date: Wed, 26 Oct 2016 23:52:48 +0800
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1

Hi Amit,

On 2016/10/26 16:26, Amit Shah wrote:
On (Wed) 26 Oct 2016 [14:43:30], Hailiang Zhang wrote:
Hi Amit,

On 2016/10/26 14:09, Amit Shah wrote:
Hello,

On (Tue) 18 Oct 2016 [20:09:56], zhanghailiang wrote:
This is the 21th version of COLO frame series.

Rebase to the latest master.

I've reviewed the patchset, have some minor comments, but overall it
looks good.  The changes are contained, and common code / existing
code paths are not affected much.  We can still target to merge this
for 2.8.


I really appreciate your help ;), I will fix all the issues later
and send v22. Hope we can still catch the deadline of V2.8.

Do you have any tests on how much the VM slows down / downtime
incurred during checkpoints?


Yes, we tested that long time ago, it all depends.
The downtime is determined by the time of transferring the dirty pages
and the time of flushing ram from ram buffer.
But we really have methods to reduce the downtime.

One method is to reduce the amount of data (dirty pages mainly) while do 
checkpoint
by transferring dirty pages asynchronously while PVM and SVM are running (no in
the time of doing checkpoint). Besides we can re-use the capability of 
migration, such
as compressing, etc.
Another method is to reduce the time of flushing ram by using userfaultfd API
to convert copying ram into marking bitmap. We can also flushing the ram buffer
by multiple threads which advised by Dave ...

Yes, I understand that as with any migration numbers, this too depends
on what the guest is doing.  However, can you just pick some standard
workload - kernel compile or something like that - and post a few
observations?


Li Zhijian has sent some test results which based on kernel colo proxy,
After switch to userspace colo proxy, there maybe some degradations.
But for the old scenario, some optimizations are not implemented.
For the new userspace colo proxy scenario, we didn't test it overall,
Because it is still WIP, we will start the work after this frame is merged.

Also, can you tell how did you arrive at the default checkpoint
interval?


Er, for this value, we referred to Remus in XEN platform. ;)
But after we implement COLO with colo proxy, this interval value will be changed
to a bigger one (10s). And we will make it configuration too. Besides, we will
add another configurable value to control the min interval of checkpointing.

OK - any typical value that is a good mix between COLO keeping the
network too busy / guest paused vs guest making progress?  Again this
is something that's workload-dependent, but I guess you have typical
numbers from a network-bound workload?


Yes, you can refer to Zhijian's email for detail.
I think it is necessary to add some test/performance results into COLO's wiki.
We will do that later.

Thanks,
hailiang

Thanks,

                Amit

.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]