qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] FW: Fwd: [RFC 00/27] Migration thread (WIP)


From: Chegu Vinod
Subject: Re: [Qemu-devel] FW: Fwd: [RFC 00/27] Migration thread (WIP)
Date: Fri, 27 Jul 2012 07:21:14 -0700
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20120713 Thunderbird/14.0

On 7/27/2012 7:11 AM, Vinod, Chegu wrote:

-----Original Message-----
From: Juan Quintela [mailto:address@hidden
Sent: Friday, July 27, 2012 4:06 AM
To: Vinod, Chegu
Cc: address@hidden; Orit Wasserman
Subject: Re: Fwd: [RFC 00/27] Migration thread (WIP)

Chegu Vinod <address@hidden> wrote:
On 7/26/2012 11:41 AM, Chegu Vinod wrote:


-------- Original Message --------

Subject: [Qemu-devel] [RFC 00/27] Migration thread (WIP) Date: Tue, 24 Jul 2012 20:36:25 +0200 From: Juan Quintela <address@hidden> To: address@hidden Hi

This series are on top of the migration-next-v5 series just posted.

First of all, this is an RFC/Work in progress.  Just a lot of people
asked for it, and I would like review of the design.

     Hello,
Thanks for sharing this early/WIP version for evaluation. Still in the middle of code review..but wanted to share a couple
     of quick  observations.
     'tried to use it to migrate a 128G/10VCPU guest (speed set to 10G
     and downtime 2s).
     Once with no workload (i.e. idle guest) and the second was with a
     SpecJBB running in the guest.
The idle guest case seemed to migrate fine... capabilities: xbzrle: off
     Migration status: completed
     transferred ram: 3811345 kbytes
     remaining ram: 0 kbytes
     total ram: 134226368 kbytes
     total time: 199743 milliseconds
In the case of the SpecJBB I ran into issues during stage 3...the
     source host's qemu and the guest hung. I need to debug this
     more... (if  already have some hints pl. let me know.).
capabilities: xbzrle: off
     Migration status: active
     transferred ram: 127618578 kbytes
     remaining ram: 2386832 kbytes
     total ram: 134226368 kbytes
     total time: 526139 milliseconds
     (qemu) qemu_savevm_state_complete called
     qemu_savevm_state_complete calling ram_save_complete
<--- hung somewhere after this ('need to get more info).

Appears to be some race condition...as there are cases when it hangs
and in some cases it succeeds.
Weird guess, try to use less vcpus, same ram.

Ok..will try that.
The way that we stop cpus is _hacky_ to say it somewhere.  Will try to think 
about that part.
Ok.
Thanks for the testing.  All my testing has been done with 8GB guests and 
2vcps.  Will try with more vcpus to see if it makes a difference.




(qemu) info migrate
capabilities: xbzrle: off
Migration status: completed
transferred ram: 129937687 kbytes
remaining ram: 0 kbytes
total ram: 134226368 kbytes
total time: 543228 milliseconds
Humm, _that_ is more strange.  This means that it finished.

There are cases where the migration is finishing just fine... even with larger guest configurations (256G/20VCPUs).


  Could you run qemu under gdb and sent me the stack traces?

I don't know your gdb thread kung-fu, so here are the instructions just in case:

gdb --args <exact qemu commandh line you used> C-c to break when it hangs 
(gdb)info threads you see all the threads running (gdb)thread 1 or whatever other 
number (gdb)bt the backtrace of that thread.

The hang is intermittent...
I ran it 4-5 times (under gdb) just now and I didn't see the issue :-(


I am specially interested in the backtrace of the migration thread and of the 
iothread.

Will keep re-trying with different configs. and see if i get lucky in reproducing it (under gdb).

Vinod

Thanks, Juan.

Need to review/debug...

Vinod



     ---
As with the non-migration-thread version the Specjbb workload
     completed before the migration attempted to move to stage 3 (i.e.
     didn't converge while the workload was still active).
BTW, with this version of the bits (i.e. while running SpecJBB
     which is supposed to dirty quite a bit of memory) I noticed that
     there wasn't much change in the b/w usage of the dedicated 10Gb
     private network link (It was still < ~1.5-3.0Gb/sec).   Expected
     this to be a little better since we have a separate thread...  not
     sure what else is in play here ? (numa locality of where the
     migration thread runs or something other basic tuning in the
     implementation ?)
'have a hi-level design question... (perhaps folks have already
     thought about it..and categorized it as potential future
     optimization..?)
Would it be possible to off load the iothread completely [from all
     migration related activity] and have one thread (with the
     appropriate protection) get involved with getting the list of the
     dirty pages ? Have one or more threads dedicated for trying to
     push multiple streams of data to saturate the allocated network
     bandwidth ?  This may help in large + busy guests. Comments?
     There  are perhaps other implications of doing all of this (like
     burning more host cpu cycles) but perhaps this can be configurable
     based on user's needs... e.g. fewer but large guests on a host
     with no over subscription.
Thanks
     Vinod
It does:
- get a new bitmap for migration, and that bitmap uses 1 bit by page
- it unfolds migration_buffered_file.  Only one user existed.
- it simplifies buffered_file a lot.

- About the migration thread, special attention was giving to try to
   get the series review-able (reviewers would tell if I got it).

Basic design:
- we create a new thread instead of a timer function
- we move all the migration work to that thread (but run everything
   except the waits with the iothread lock.
- we move all the writting to outside the iothread lock.  i.e.
   we walk the state with the iothread hold, and copy everything to one buffer.
   then we write that buffer to the sockets outside the iothread lock.
- once here, we move to writting synchronously to the sockets.
- this allows us to simplify quite a lot.

And basically, that is it.  Notice that we still do the iterate page
walking with the iothread held.  Light testing show that we got
similar speed and latencies than without the thread (notice that
almost no optimizations done here yet).

Appart of the review:
- Are there any locking issues that I have missed (I guess so)
- stop all cpus correctly.  vm_stop should be called from the iothread,
   I use the trick of using a bottom half to get that working correctly.
   but this _implementation_ is ugly as hell.  Is there an easy way
   of doing it?
- Do I really have to export last_ram_offset(), there is no other way
   of knowing the ammount of RAM?

Known issues:

- for some reason, when it has to start a 2nd round of bitmap
   handling, it decides to dirty all pages.  Haven't found still why
   this happens.

If you can test it, and said me where it breaks, it would also help.

Work is based on Umesh thread work, and work that Paolo Bonzini had
work on top of that.  All the mirgation thread was done from scratch
becase I was unable to debug why it was failing, but it "owes" a lot
to the previous design.

Thanks in advance, Juan.

The following changes since commit a21143486b9c6d7a50b7b62877c02b3c686943cb:

   Merge remote-tracking branch 'stefanha/net' into staging (2012-07-23
13:15:34 -0500)

are available in the git repository at:


   http://repo.or.cz/r/qemu/quintela.git migration-thread-v1

for you to fetch changes up to 27e539b03ba97bc37e107755bcb44511ec4c8100:

   buffered_file: unfold buffered_append in buffered_put_buffer
(2012-07-24 16:46:13 +0200)


Juan Quintela (23):
   buffered_file: g_realloc() can't fail
   savevm: Factorize ram globals reset in its own function
   ram: introduce migration_bitmap_set_dirty()
   ram: Introduce migration_bitmap_test_and_reset_dirty()
   ram: Export last_ram_offset()
   ram: introduce migration_bitmap_sync()
   Separate migration bitmap
   buffered_file: rename opaque to migration_state
   buffered_file: opaque is MigrationState
   buffered_file: unfold migrate_fd_put_buffer
   buffered_file: unfold migrate_fd_put_ready
   buffered_file: unfold migrate_fd_put_buffer
   buffered_file: unfold migrate_fd_put_buffer
   buffered_file: We can access directly to bandwidth_limit
   buffered_file: Move from using a timer to use a thread
   migration: make qemu_fopen_ops_buffered() return void
   migration: stop all cpus correctly
   migration: make writes blocking
   migration: remove unfreeze logic
   migration: take finer locking
   buffered_file: Unfold the trick to restart generating migration data
   buffered_file: don't flush on put buffer
   buffered_file: unfold buffered_append in buffered_put_buffer

Paolo Bonzini (2):
   split MRU ram list
   BufferedFile: append, then flush

Umesh Deshpande (2):
   add a version number to ram_list
   protect the ramlist with a separate mutex

  arch_init.c      |  108 +++++++++++++++++++++++++-------
  buffered_file.c  |  179 +++++++++++++++++-------------------------------------
  buffered_file.h  |   12 +---
  cpu-all.h        |   17 +++++-
  exec-obsolete.h  |   10 ---
  exec.c           |   45 +++++++++++---
  migration-exec.c |    2 -
  migration-fd.c   |    6 --
  migration-tcp.c  |    2 +-
  migration-unix.c |    2 -
  migration.c      |  111 ++++++++++++++-------------------
  migration.h      |    6 ++
  qemu-file.h      |    5 --
  savevm.c         |    5 --
  14 files changed, 249 insertions(+), 261 deletions(-)




reply via email to

[Prev in Thread] Current Thread [Next in Thread]