qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v1] migration: skip sending ram pages released b


From: Jitendra Kolhe
Subject: Re: [Qemu-devel] [PATCH v1] migration: skip sending ram pages released by virtio-balloon driver.
Date: Tue, 22 Mar 2016 11:17:57 +0530
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.7.0

On 3/18/2016 4:57 PM, Roman Kagan wrote:
> [ Sorry I've lost this thread with email setup changes on my side;
> catching up ]
> 
> On Tue, Mar 15, 2016 at 06:50:45PM +0530, Jitendra Kolhe wrote:
>> On 3/11/2016 8:09 PM, Jitendra Kolhe wrote:
>>> Here is what
>>> I tried, let’s say we have 3 versions of qemu (below timings are for
>>> 16GB idle guest with 12GB ballooned out)
>>>
>>> v1. Unmodified qemu – absolutely not code change – Total Migration time
>>> = ~7600ms (I rounded this one to ~8000ms)
>>> v2. Modified qemu 1 – with proposed patch set (which skips both zero
>>> pages scan and migrating control information for ballooned out pages) -
>>> Total Migration time = ~5700ms
>>> v3. Modified qemu 2 – only with changes to save_zero_page() as discussed
>>> in previous mail (and of course using proposed patch set only to
>>> maintain bitmap for ballooned out pages) – Total migration time is
>>> irrelevant in this case.
>>> Total Zero page scan time = ~1789ms
>>> Total (save_page_header + qemu_put_byte(f, 0)) = ~556ms.
>>> Everything seems to add up here (may not be exact) – 5700+1789+559 =
>>> ~8000ms
>>>
>>> I see 2 factors that we have not considered in this add up a. overhead
>>> for migrating balloon bitmap to target and b. as you mentioned below
>>> overhead of qemu_clock_get_ns().
>>
>> Missed one more factor of testing each page against balloon bitmap during
>> migration, which is consuming around ~320ms for same configuration. If we
>> remove this overhead which is introduced by proposed patch set from above
>> calculation we almost get total migration time for unmodified qemu
>> (5700-320+1789+559=~7700ms)

Thanks for your response, just to clarify my understanding first, with
"protocol" you mean - saving or sending, header or control information 
per page during migration?
I am drafting my below response based on this assumption.

> 
> I'm a bit lost in the numbers you quote, so let me try with
> back-of-the-envelope calculation.
> 
> First off, the way you identify pages that don't need to be sent is
> basically orthogonal to how you optimize the protocol to send them.  So
> teaching is_zero_range() to consult unmapped or ballooned out page map
> looks like a low-hanging fruit that may benefit the migration time by
> avoiding scanning the memory, without protocol changes. 

Yes, the intention of proposed patch is not to optimize existing
protocol, which is used to send control or header information during migration.
Changes only to is_zero_range() should still show benefit in migration time.

> [And vice versa,
> if sending the zero pages bitmap brought so big benefit it would make
> sense to apply it to pages found by scanning, too].
> 

I am not sure if we would see any or much benefit with this, with the timings
that we are seeing the time to test against a bitmap vs. sending control or
header information is not huge.
In case of proposed patch we are anyways spending time to test against bitmap
to avoid zero page scan.

> Now regarding the protocol:
> 
>  - as a first approximation, let's speak in terms of transferred data
>    size
> 
>  - consider a VM using 1/10 of its memory (I think this can be
>    considered an extreme of over-provisioning)
> 
>  - a whiteout is 3 decimal orders smaller than a page, so with zero
>    pages replaced by whiteouts (current protocol) the overall
>    transferred data size for zero pages is on the order of a percent of
>    the total transferred data size
> 
>  - zero page bitmap would reduce that further by a couple of orders
> 
> So, if this calculation is not totally off, extending the protocol to
> use zero page bitmaps is unlikely to give an improvement at more than a
> percent level.
> 

I agree that current protocol has already reduced total transferred data
size to less than a percent compared to actually sending the zero page.
But here we are talking even to reduce it further by not sending control
or header information.
On my test setup average zero page scan time for every 12GB zero page
is around 1789ms and time taken to send header or control information is
around 559ms for same 12GB zero pages, which constitutes around 30% of
zero page scan time.

I think the point here is, should we consider ballooned out pages as guest
pages and treat them as any other guest ram pages so we expect existing
protocol to take care of them or should we treat them as non guest ram pages
in which case, it may be fine to skip standard protocol?
Note, proposed patch is only focused on ballooned out pages which is a
subset of guest zero page set.

> I'm not sure it pays off the extra code paths and incompatible protocol
> changes...
> 
> Roman.
> 

If skipping sending control or header information for “only” ballooned out
pages raises doubt about protocol compatibility then, yes I agree it’s not
worth the gain we see. We can still localize solution to is_zero_range() 
scan and avoid scanning for zero pages.

Thanks,
- Jitendra



reply via email to

[Prev in Thread] Current Thread [Next in Thread]