qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] migration: keep bytes_xfer_prev init'd to zero


From: Felipe Franciosi
Subject: Re: [Qemu-devel] [PATCH] migration: keep bytes_xfer_prev init'd to zero
Date: Thu, 25 May 2017 10:50:46 +0000

> On 25 May 2017, at 01:36, Peter Xu <address@hidden> wrote:
> 
> On Wed, May 24, 2017 at 01:02:25PM +0000, Felipe Franciosi wrote:
>> 
>>> On 23 May 2017, at 05:27, Peter Xu <address@hidden> wrote:
>>> 
>>> On Fri, May 19, 2017 at 10:59:02PM +0100, Felipe Franciosi wrote:
>>>> The first time migration_bitmap_sync() is called, bytes_xfer_prev is set
>>>> to ram_state.bytes_transferred which is, at this point, zero. The next
>>>> time migration_bitmap_sync() is called, an iteration has happened and
>>>> bytes_xfer_prev is set to 'x' bytes. Most likely, more than one second
>>>> has passed, so the auto converge logic will be triggered and
>>>> bytes_xfer_now will also be set to 'x' bytes.
>>>> 
>>>> This condition is currently masked by dirty_rate_high_cnt, which will
>>>> wait for a few iterations before throttling. It would otherwise always
>>>> assume zero bytes have been copied and therefore throttle the guest
>>>> (possibly) prematurely.
>>>> 
>>>> Given bytes_xfer_prev is only used by the auto convergence logic, it
>>>> makes sense to only set its value after a check has been made against
>>>> bytes_xfer_now.
>>>> 
>>>> Signed-off-by: Felipe Franciosi <address@hidden>
>>>> ~
>>>> ---
>>>> migration/ram.c | 4 ----
>>>> 1 file changed, 4 deletions(-)
>>>> 
>>>> diff --git a/migration/ram.c b/migration/ram.c
>>>> index f59fdd4..793af39 100644
>>>> --- a/migration/ram.c
>>>> +++ b/migration/ram.c
>>>> @@ -670,10 +670,6 @@ static void migration_bitmap_sync(RAMState *rs)
>>>> 
>>>>    rs->bitmap_sync_count++;
>>>> 
>>>> -    if (!rs->bytes_xfer_prev) {
>>>> -        rs->bytes_xfer_prev = ram_bytes_transferred();
>>>> -    }
>>>> -
>>>>    if (!rs->time_last_bitmap_sync) {
>>>>        rs->time_last_bitmap_sync = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>>>>    }
>>>> -- 
>>>> 1.9.5
>>>> 
>>>> 
>>> 
>>> I feel like this patch wants to correctly initialize bytes_xfer_prev,
>>> however I still see problem. E.g., when user specify auto-convergence
>>> during migration, and in the first iteration we'll always have a very
>>> small bytes_xfer_prev (with this patch, it'll be zero) with a very big
>>> bytes_xfer_now (which is the ram_bytes_transferred() value).
>> 
>> Interesting point. Worth noting, that's no different than what happens today 
>> anyway (bytes_xfer_prev would be initialised to the first non-zero 
>> bytes_transferred). I therefore don't think it should stop or slow this 
>> patch's acceptance.
>> 
>>> If so, do
>>> you think squash below change together with current one would be more
>>> accurate?
>> 
>> As a matter of fact I had a different idea (below) to fix what you are 
>> describing. I was still experimenting with this code, so haven't sent a 
>> patch yet. But I'm going to send a series soon for comments. Basically, I 
>> think some other changes are required to make sure these numbers are correct:
> 
> I agree on most points.
> 
>> 
>> 1) dirty_rate_high_cnt++ >= 2 should be ++dirty_rate_high_cnt >= 2
>> - The original commit msg from 070afca25 (Jason J. Herne) says that 
>> convergence should be triggered after two passes. Current code does it after 
>> three passes (and four passes the first time around; see number 2 below).
>> - I personally feel this counter should go away altogether. If the migration 
>> is not converging and the VM is going to be throttled, there's no point 
>> stressing the network any further; just start throttling straight away.
>> 
>> 2) dirty_pages_rate should be updated before the autoconverge logic.
>> - Right now, we delay throttling by a further iteration, as dirty_pages_rate 
>> is set after the first pass through the autoconverge logic (it is zero the 
>> first time around).
>> - The "if (rs->dirty_pages_rate &&..." part of the conditional can then be 
>> removed, as it won't ever be zero.
> 
> For this one: why dirty_pages_rate cannot be zero?

Yeah good point. If _nothing_ got dirtied in the period (possible I suppose), 
then the dirty rate would be zero.

> But I agree with
> you that it can be removed since even if it's zero, then the next
> check would fail as well (rs->num_dirty_pages_period *
> TARGET_PAGE_SIZE > (bytes_xfer_now - rs->bytes_xfer_prev) / 2).

Exactly. I suppose that check was introduced to consider the first iteration 
where the counter hadn't been initialised. But it's not needed.

> 
>> 
>> 3) bytes_xfer counters should be updated alongside dirty_pages counters (for 
>> the same period).
>> - This fixes the issue you described, as bytes_xfer_* will correspond to the 
>> period.
>> 
>> I'll send the series shortly. Thoughts in the meantime?
> 
> I'll reply to that patchset then. Thanks.

Thanks for the quick feedback on this. I've included this change on the 
patchset, so let's continue the conversation there.

Felipe

> 
> -- 
> Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]