qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] migration: keep bytes_xfer_prev init'd to zero


From: Peter Xu
Subject: Re: [Qemu-devel] [PATCH] migration: keep bytes_xfer_prev init'd to zero
Date: Thu, 25 May 2017 08:36:09 +0800
User-agent: Mutt/1.5.24 (2015-08-30)

On Wed, May 24, 2017 at 01:02:25PM +0000, Felipe Franciosi wrote:
> 
> > On 23 May 2017, at 05:27, Peter Xu <address@hidden> wrote:
> > 
> > On Fri, May 19, 2017 at 10:59:02PM +0100, Felipe Franciosi wrote:
> >> The first time migration_bitmap_sync() is called, bytes_xfer_prev is set
> >> to ram_state.bytes_transferred which is, at this point, zero. The next
> >> time migration_bitmap_sync() is called, an iteration has happened and
> >> bytes_xfer_prev is set to 'x' bytes. Most likely, more than one second
> >> has passed, so the auto converge logic will be triggered and
> >> bytes_xfer_now will also be set to 'x' bytes.
> >> 
> >> This condition is currently masked by dirty_rate_high_cnt, which will
> >> wait for a few iterations before throttling. It would otherwise always
> >> assume zero bytes have been copied and therefore throttle the guest
> >> (possibly) prematurely.
> >> 
> >> Given bytes_xfer_prev is only used by the auto convergence logic, it
> >> makes sense to only set its value after a check has been made against
> >> bytes_xfer_now.
> >> 
> >> Signed-off-by: Felipe Franciosi <address@hidden>
> >> ~
> >> ---
> >> migration/ram.c | 4 ----
> >> 1 file changed, 4 deletions(-)
> >> 
> >> diff --git a/migration/ram.c b/migration/ram.c
> >> index f59fdd4..793af39 100644
> >> --- a/migration/ram.c
> >> +++ b/migration/ram.c
> >> @@ -670,10 +670,6 @@ static void migration_bitmap_sync(RAMState *rs)
> >> 
> >>     rs->bitmap_sync_count++;
> >> 
> >> -    if (!rs->bytes_xfer_prev) {
> >> -        rs->bytes_xfer_prev = ram_bytes_transferred();
> >> -    }
> >> -
> >>     if (!rs->time_last_bitmap_sync) {
> >>         rs->time_last_bitmap_sync = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> >>     }
> >> -- 
> >> 1.9.5
> >> 
> >> 
> > 
> > I feel like this patch wants to correctly initialize bytes_xfer_prev,
> > however I still see problem. E.g., when user specify auto-convergence
> > during migration, and in the first iteration we'll always have a very
> > small bytes_xfer_prev (with this patch, it'll be zero) with a very big
> > bytes_xfer_now (which is the ram_bytes_transferred() value).
> 
> Interesting point. Worth noting, that's no different than what happens today 
> anyway (bytes_xfer_prev would be initialised to the first non-zero 
> bytes_transferred). I therefore don't think it should stop or slow this 
> patch's acceptance.
> 
> > If so, do
> > you think squash below change together with current one would be more
> > accurate?
> 
> As a matter of fact I had a different idea (below) to fix what you are 
> describing. I was still experimenting with this code, so haven't sent a patch 
> yet. But I'm going to send a series soon for comments. Basically, I think 
> some other changes are required to make sure these numbers are correct:

I agree on most points.

> 
> 1) dirty_rate_high_cnt++ >= 2 should be ++dirty_rate_high_cnt >= 2
> - The original commit msg from 070afca25 (Jason J. Herne) says that 
> convergence should be triggered after two passes. Current code does it after 
> three passes (and four passes the first time around; see number 2 below).
> - I personally feel this counter should go away altogether. If the migration 
> is not converging and the VM is going to be throttled, there's no point 
> stressing the network any further; just start throttling straight away.
> 
> 2) dirty_pages_rate should be updated before the autoconverge logic.
> - Right now, we delay throttling by a further iteration, as dirty_pages_rate 
> is set after the first pass through the autoconverge logic (it is zero the 
> first time around).
> - The "if (rs->dirty_pages_rate &&..." part of the conditional can then be 
> removed, as it won't ever be zero.

For this one: why dirty_pages_rate cannot be zero? But I agree with
you that it can be removed since even if it's zero, then the next
check would fail as well (rs->num_dirty_pages_period *
TARGET_PAGE_SIZE > (bytes_xfer_now - rs->bytes_xfer_prev) / 2).

> 
> 3) bytes_xfer counters should be updated alongside dirty_pages counters (for 
> the same period).
> - This fixes the issue you described, as bytes_xfer_* will correspond to the 
> period.
> 
> I'll send the series shortly. Thoughts in the meantime?

I'll reply to that patchset then. Thanks.

-- 
Peter Xu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]