Re: [Qemu-devel] [PATCH 05/10] migration: Fix the migrate auto converge

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 05/10] migration: Fix the migrate auto converge

From:	Chegu Vinod
Subject:	Re: [Qemu-devel] [PATCH 05/10] migration: Fix the migrate auto converge process
Date:	Tue, 11 Mar 2014 15:56:12 -0700
User-agent:	Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0

On 3/11/2014 1:48 PM, Juan Quintela wrote:

<address@hidden> wrote:

From: ChenLiang <address@hidden>

It is inaccuracy and complex that using the transfer speed of
migration thread to determine whether the convergence migration.
The dirty page may be compressed by XBZRLE or ZERO_PAGE.The counter
of updating dirty bitmap will be increasing continuously if the
migration can't convergence.

"It is inexact and complex to use the migration transfer speed to
dectermine weather the convergence of migration."

@@ -530,21 +523,11 @@ static void migration_bitmap_sync(void)
      /* more than 1 second = 1000 millisecons */
      if (end_time > start_time + 1000) {
          if (migrate_auto_converge()) {
-            /* The following detection logic can be refined later. For now:
-               Check to see if the dirtied bytes is 50% more than the approx.
-               amount of bytes that just got transferred since the last time we
-               were in this routine. If that happens >N times (for now N==4)
-               we turn on the throttle down logic */
-            bytes_xfer_now = ram_bytes_transferred();
-            if (s->dirty_pages_rate &&
-               (num_dirty_pages_period * TARGET_PAGE_SIZE >
-                   (bytes_xfer_now - bytes_xfer_prev)/2) &&
-               (dirty_rate_high_cnt++ > 4)) {
-                    trace_migration_throttle();
-                    mig_throttle_on = true;
-                    dirty_rate_high_cnt = 0;
-             }
-             bytes_xfer_prev = bytes_xfer_now;
+            if (get_bitmap_sync_cnt() > 15) {
+                /* It indicates that migration can't converge when the counter
+                is larger than fifteen. Enable the feature of auto
      converge */

Comment is not needed, it says excatly what the code does.

But why 15?  It is not that I think that the older code is better or
worse than yours.  Just that we move from one magic number to another
(that is even bigger).

Shouldn't it be easier jut just change mig_sleep_cpu()

to do something like:


static void mig_sleep_cpu(void *opq)
{
     qemu_mutex_unlock_iothread();
     g_usleep((2*get_bitmap_sync_cnt()*1000);
     qemu_mutex_lock_iothread();
}

This would get the 30ms on the 15th iteration.  I am open to change that
formula to anything different, but what I want is changing this to
something that makes the less convergence -> the more throotling.

< 'already got some feedback earlier on this and had this task in thelist of things

    to work on... :)   >

Having the throttling start with some per-defined "degree" and then havethat "degree" gradually increase ...either


a) automatically as shown in Juan's example above (or)

b) via some TBD user level interface...

...is one way to help with ensuring convergence for all cases.

The issue of continuing to increase this "degree" of throttling is anobvious area of concern for the workload ( that is still trying to runin the VM). Would it it better to force the live migration to switchfrom the iterative pre-copy phase to the "downtime" phase ...if it failsto converge even after throttling it for a couple of iterations ? Doingso could result in a longer actual downtime. Hope to try this andsee... but if anyone has inputs(other than doing post-copy etc) pl. doshare.


BTW, you are testing this with any workload to see that it improves?


Yes. Please do share some data.

+                mig_throttle_on = true;
+            }

Vinod, what do you think?

As is noted in the current code...the "logic" to detect the lack ofconvergence needs to be refined. If there is a better way to help detectsame (and which covers these other cases like XBZRLE etc) then I am allfor it. I do agree with Juan about the choice of magic numbers (i.e.one may not be better than the other).


BTW, on a related note...

I haven't used XBZRLE in the recent past (after having tried it in theearly days). Does it now perform well with larger sized VMs running realworld workloads ? Assume that is where you found that there was stillneed for forcing convergence ?

Pl. do consider sharing some results about the type of workload and alsothe size of the VMs etc that you have tried with XBZRLE.

Do you have a workload to test this?

Hmm... One can test this with memory intensive Java warehouse type ofworkloads (besides using synthetic workloads).


Vinod

Thanks, Juan.
.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PATCH 04/10] XBZRLE: rebuild the cache_is_cached function, (continued)
- [Qemu-devel] [PATCH 01/10] XBZRLE: Fix one XBZRLE corruption issues, arei.gonglei, 2014/03/11
  - Re: [Qemu-devel] [PATCH 01/10] XBZRLE: Fix one XBZRLE corruption issues, Juan Quintela, 2014/03/11
- [Qemu-devel] [PATCH 06/10] migraion: optimiztion xbzrle by reducing data copy, arei.gonglei, 2014/03/11
  - Re: [Qemu-devel] [PATCH 06/10] migraion: optimiztion xbzrle by reducing data copy, Juan Quintela, 2014/03/11
  - Re: [Qemu-devel] [PATCH 06/10] migraion: optimiztion xbzrle by reducing data copy, Juan Quintela, 2014/03/11
  - Re: [Qemu-devel] [PATCH 06/10] migraion: optimiztion xbzrle by reducing data copy, Eric Blake, 2014/03/11
- [Qemu-devel] [PATCH 05/10] migration: Fix the migrate auto converge process, arei.gonglei, 2014/03/11
  - Re: [Qemu-devel] [PATCH 05/10] migration: Fix the migrate auto converge process, Juan Quintela, 2014/03/11
    - Re: [Qemu-devel] [PATCH 05/10] migration: Fix the migrate auto converge process, Eric Blake, 2014/03/11
    - Re: [Qemu-devel] [PATCH 05/10] migration: Fix the migrate auto converge process, Chegu Vinod <=
    - Re: [Qemu-devel] [PATCH 05/10] migration: Fix the migrate auto converge process, Gonglei (Arei), 2014/03/18
- [Qemu-devel] [PATCH 03/10] XBZRLE: optimize XBZRLE to decrease the cache missing, arei.gonglei, 2014/03/11
  - Re: [Qemu-devel] [PATCH 03/10] XBZRLE: optimize XBZRLE to decrease the cache missing, Juan Quintela, 2014/03/11
    - Re: [Qemu-devel] [PATCH 03/10] XBZRLE: optimize XBZRLE to decrease the cache missing, Gonglei (Arei), 2014/03/18
  - Re: [Qemu-devel] [PATCH 03/10] XBZRLE: optimize XBZRLE to decrease the cache missing, Eric Blake, 2014/03/11
- [Qemu-devel] [PATCH 09/10] migration: expose the bitmap_sync_cnt to the end user, arei.gonglei, 2014/03/11
  - Re: [Qemu-devel] [PATCH 09/10] migration: expose the bitmap_sync_cnt to the end user, Eric Blake, 2014/03/11
  - Re: [Qemu-devel] [PATCH 09/10] migration: expose the bitmap_sync_cnt to the end user, Juan Quintela, 2014/03/11
- [Qemu-devel] [PATCH 07/10] migraion: clear the death code, arei.gonglei, 2014/03/11
  - Re: [Qemu-devel] [PATCH 07/10] migraion: clear the death code, Eric Blake, 2014/03/11

Prev by Date: Re: [Qemu-devel] [RFC PATCH v2 10/12] mc: expose tunable parameter for checkpointing frequency
Next by Date: [Qemu-devel] [PATCH 0/6] error: Misc cleanups and improvements
Previous by thread: Re: [Qemu-devel] [PATCH 05/10] migration: Fix the migrate auto converge process
Next by thread: Re: [Qemu-devel] [PATCH 05/10] migration: Fix the migrate auto converge process
Index(es):
- Date
- Thread