qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 00/30] Migration thread 20121017 edition


From: Chegu Vinod
Subject: Re: [Qemu-devel] [PATCH 00/30] Migration thread 20121017 edition
Date: Wed, 24 Oct 2012 06:49:41 -0700
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20121010 Thunderbird/16.0.1

On 10/24/2012 6:40 AM, Vinod, Chegu wrote:

Hi

 

This series apply on top of the refactoring that I sent yesterday.

Changes from the last version include:

 

- buffered_file.c is gone, its functionality is merged in migration.c

  special attention to the megre of buffered_file_thread() &

  migration_file_put_notify().

 

- Some more bitmap handling optimizations (thanks to Orit & Paolo for

  suggestions and code and Vinod for testing)

 

Please review.  Included is the pointer to the full tree.

 

Thanks, Juan.

 

The following changes since commit b6348f29d033d5a8a26f633d2ee94362595f32a4:

 

  target-arm/translate: Fix RRX operands (2012-10-17 19:56:46 +0200)

 

are available in the git repository at:

 

  http://repo.or.cz/r/qemu/quintela.git migration-thread-20121017

 

for you to fetch changes up to 486dabc29f56d8f0e692395d4a6cd483b3a77f01:

 

  ram: optimize migration bitmap walking (2012-10-18 09:20:34 +0200)

 

 

v3:

 

This is work in progress on top of the previous migration series just sent.

 

- Introduces a thread for migration instead of using a timer and callback

- remove the writting to the fd from the iothread lock

- make the writes synchronous

- Introduce a new pending method that returns how many bytes are pending for

  one save live section

- last patch just shows printfs to see where the time is being spent

  on the migration complete phase.

  (yes it pollutes all uses of stop on the monitor)

 

So far I have found that we spent a lot of time on bdrv_flush_all() It

can take from 1ms to 600ms (yes, it is not a typo).  That dwarfs the

migration default downtime time (30ms).

 

Stop all vcpus:

 

- it works now (after the changes on qemu_cpu_is_vcpu on the previous

  series) caveat is that the time that brdv_flush_all() takes is

  "unpredictable".  Any silver bullets?

 

  Paolo suggested to call for migration completion phase:

 

  bdrv_aio_flush_all();

  Sent the dirty pages;

  bdrv_drain_all()

  brdv_flush_all()

  another round through the bitmap in case that completions have

  changed some page

 

  Paolo, did I get it right?

  Any other suggestion?

 

- migrate_cancel() is not properly implemented (as in the film that we

  take no locks, ...)

 

- expected_downtime is not calculated.

 

  I am about to merge migrate_fd_put_ready & buffered_thread() and

  that would make trivial to calculate.

 

It outputs something like:

 

wakeup_request 0

time cpu_disable_ticks 0

time pause_all_vcpus 1

time runstate_set 1

time vmstate_notify 2

time bdrv_drain_all 2

time flush device

/dev/disk/by-path/ip-192.168.10.200:3260-iscsi-iqn.2010-12.org.trasno:iscsi.lvm-lun-1:

3

time flush device : 3

time flush device : 3

time flush device : 3

time bdrv_flush_all 5

time monitor_protocol_event 5

vm_stop 2 5

synchronize_all_states 1

migrate RAM 37

migrate rest devices 1

complete without error 3a 44

completed 45

end completed stage 45

 

As you can see, we estimate that we can sent all pending data in 30ms,

it took 37ms to send the RAM (that is what we calculate).  So

estimation is quite good.

 

What it gives me lots of variation is on the line with device name of "time

flush device".

That is what varies between 1ms to 600ms

 

This is in a completely idle guest.  I am running:

 

        while (1) {

                uint64_t delay;

 

                if (gettimeofday(&t0, NULL) != 0)

                        perror("gettimeofday 1");

                if (usleep(ms2us(10)) != 0)

                        perror("usleep");

                if (gettimeofday(&t1, NULL) != 0)

                        perror("gettimeofday 2");

 

                t1.tv_usec -= t0.tv_usec;

                if (t1.tv_usec < 0) {

                        t1.tv_usec += 1000000;

                        t1.tv_sec--;

                }

                t1.tv_sec -= t0.tv_sec;

 

                delay = t1.tv_sec * 1000 + t1.tv_usec/1000;

 

                if (delay > 100)

                        printf("delay of %ld ms\n", delay);

       }

 

To see the latency inside the guest (i.e. ask for a 10ms sleep, and see how

long it takes).

 

 

address@hidden ~]# ./timer

delay of 161 ms

delay of 135 ms

delay of 143 ms

delay of 132 ms

delay of 131 ms

delay of 141 ms

delay of 113 ms

delay of 119 ms

delay of 114 ms

 

 

But that values are independent of migration.  Without even starting

the migration, idle guest doing nothing, we get it sometimes.

 

Juan Quintela (27):

  buffered_file: Move from using a timer to use a thread

  migration: make qemu_fopen_ops_buffered() return void

  migration: stop all cpus correctly

  migration: make writes blocking

  migration: remove unfreeze logic

  migration: take finer locking

  buffered_file: Unfold the trick to restart generating migration data

  buffered_file: don't flush on put buffer

  buffered_file: unfold buffered_append in buffered_put_buffer

  savevm: New save live migration method: pending

  migration: include qemu-file.h

  migration-fd: remove duplicate include

  migration: move buffered_file.c code into migration.c

  migration: move migration_fd_put_ready()

  migration: Inline qemu_fopen_ops_buffered into migrate_fd_connect

  migration: move migration notifier

  migration: move begining stage to the migration thread

  migration: move exit condition to migration thread

  migration: unfold rest of migrate_fd_put_ready() into thread

  migration: print times for end phase

  ram: rename last_block to last_seen_block

  ram: Add last_sent_block

  memory: introduce memory_region_test_and_clear_dirty

  ram: Use memory_region_test_and_clear_dirty

  fix memory.c

  migration: Only go to the iterate stage if there is anything to send

  ram: optimize migration bitmap walking

 

Paolo Bonzini (1):

  split MRU ram list

 

Umesh Deshpande (2):

  add a version number to ram_list

  protect the ramlist with a separate mutex

 

Makefile.objs     |   2 +-

arch_init.c       | 133 +++++++++++--------

block-migration.c |  49 ++-----

block.c           |   6 +

buffered_file.c   | 256 -----------------------------------

buffered_file.h   |  22 ---

cpu-all.h         |  13 +-

cpus.c            |  17 +++

exec.c            |  44 +++++-

memory.c          |  17 +++

memory.h          |  18 +++

migration-exec.c  |   4 +-

migration-fd.c    |   9 +-

migration-tcp.c   |  21 +--

migration-unix.c  |   4 +-

migration.c       | 391 ++++++++++++++++++++++++++++++++++++++++--------------

migration.h       |   4 +-

qemu-file.h       |   5 -

savevm.c          |  37 +++++-

sysemu.h          |   1 +

vmstate.h         |   1 +

21 files changed, 522 insertions(+), 532 deletions(-)

delete mode 100644 buffered_file.c

delete mode 100644 buffered_file.h

 

--

1.7.11.7

 



Tested-by: Chegu Vinod  <address@hidden>


Using these patches 'have verified live migration (on x86_64 platforms) for guest sizes varying from 64G/10vcpus thru 768G/80vcpus and I have seen reduction in both the downtime as well as the total migration time.  The dirty bitmap optimizations have shown improvements too and have helped in the reduction of the downtime (perhaps more can be done as a next step..i.e. after the above changes (-minus the printf's) make it into upstream).  The new migration stats that were added were useful too !

Thanks
Vinod


Vinod


Vinod


reply via email to

[Prev in Thread] Current Thread [Next in Thread]