qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH COLO-Frame (Base) v21 00/17] COarse-grain LOck-s


From: Amit Shah
Subject: Re: [Qemu-devel] [PATCH COLO-Frame (Base) v21 00/17] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
Date: Thu, 27 Oct 2016 09:22:56 +0530

On (Wed) 26 Oct 2016 [23:52:48], Hailiang Zhang wrote:
> Hi Amit,
> 
> On 2016/10/26 16:26, Amit Shah wrote:
> >On (Wed) 26 Oct 2016 [14:43:30], Hailiang Zhang wrote:
> >>Hi Amit,
> >>
> >>On 2016/10/26 14:09, Amit Shah wrote:
> >>>Hello,
> >>>
> >>>On (Tue) 18 Oct 2016 [20:09:56], zhanghailiang wrote:
> >>>>This is the 21th version of COLO frame series.
> >>>>
> >>>>Rebase to the latest master.
> >>>
> >>>I've reviewed the patchset, have some minor comments, but overall it
> >>>looks good.  The changes are contained, and common code / existing
> >>>code paths are not affected much.  We can still target to merge this
> >>>for 2.8.
> >>>
> >>
> >>I really appreciate your help ;), I will fix all the issues later
> >>and send v22. Hope we can still catch the deadline of V2.8.
> >>
> >>>Do you have any tests on how much the VM slows down / downtime
> >>>incurred during checkpoints?
> >>>
> >>
> >>Yes, we tested that long time ago, it all depends.
> >>The downtime is determined by the time of transferring the dirty pages
> >>and the time of flushing ram from ram buffer.
> >>But we really have methods to reduce the downtime.
> >>
> >>One method is to reduce the amount of data (dirty pages mainly) while do 
> >>checkpoint
> >>by transferring dirty pages asynchronously while PVM and SVM are running 
> >>(no in
> >>the time of doing checkpoint). Besides we can re-use the capability of 
> >>migration, such
> >>as compressing, etc.
> >>Another method is to reduce the time of flushing ram by using userfaultfd 
> >>API
> >>to convert copying ram into marking bitmap. We can also flushing the ram 
> >>buffer
> >>by multiple threads which advised by Dave ...
> >
> >Yes, I understand that as with any migration numbers, this too depends
> >on what the guest is doing.  However, can you just pick some standard
> >workload - kernel compile or something like that - and post a few
> >observations?
> >
> 
> Li Zhijian has sent some test results which based on kernel colo proxy,
> After switch to userspace colo proxy, there maybe some degradations.
> But for the old scenario, some optimizations are not implemented.
> For the new userspace colo proxy scenario, we didn't test it overall,
> Because it is still WIP, we will start the work after this frame is merged.

OK.

> >>>Also, can you tell how did you arrive at the default checkpoint
> >>>interval?
> >>>
> >>
> >>Er, for this value, we referred to Remus in XEN platform. ;)
> >>But after we implement COLO with colo proxy, this interval value will be 
> >>changed
> >>to a bigger one (10s). And we will make it configuration too. Besides, we 
> >>will
> >>add another configurable value to control the min interval of checkpointing.
> >
> >OK - any typical value that is a good mix between COLO keeping the
> >network too busy / guest paused vs guest making progress?  Again this
> >is something that's workload-dependent, but I guess you have typical
> >numbers from a network-bound workload?
> >
> 
> Yes, you can refer to Zhijian's email for detail.
> I think it is necessary to add some test/performance results into COLO's wiki.
> We will do that later.

Yes, please.

Also, in your next iteration, please add the colo files to the
MAINTAINERS entry so you get CC'ed on future patches (and bugs :-)

                Amit



reply via email to

[Prev in Thread] Current Thread [Next in Thread]