[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v3 00/35] postcopy live migration
From: |
Isaku Yamahata |
Subject: |
Re: [Qemu-devel] [PATCH v3 00/35] postcopy live migration |
Date: |
Wed, 31 Oct 2012 12:25:35 +0900 |
User-agent: |
Mutt/1.5.19 (2009-01-05) |
On Tue, Oct 30, 2012 at 06:53:31PM +0000, Benoit Hudzia wrote:
> Hi Isaku,
>
>
> Are you going to be at the KVM forum ( i think you have a presentation there).
> It would be nice if we could meet in order to see if we can synch our efforts
> .
Yes, definitively.
> As you know we have been developing an RDMA based solution for post copy
> migration and we demonstrated the initial proof of concept in december 2012 (
> we published some finding in VHPC 2012 and are working with Petter Svard from
> Umea on a journal paper with more detailed performance review) .
Do you have any pointers to available papers/slides?
I can't find any at http://vhpc.org/
> While RDMA post copy live migration is just of by product of our long term
> effort ( i will present the project in my talk at KVM forum) we grabbed the
> opportunity to address problems we were facing with the live migration of
> enterprise workload . Namely how to migrate in memory database such has HANA
> under load.
>
> We quickly discovered that pre copy ( even with optimization ) didn't work
> with
> such workload. We also tried your code however the performance where far from
> satisfying with large VM under load due to the heavy cost of transferring
> memory between user space - kernel multiple time ( actually it often failed)
If possible, I'd like to see the details.
> We then tested a pure RDMA solution we developed ( we suport HW and
> software
> RDMA ) and it work fine with all the workload we tested ( we migrated VM
> with 20+ GB running SAP HANA under a workload similar to TPC-H) and we hop to
> test with bigger configuration soon ( 1/2 + TB of memory) .
>
> However the state of integration of our code with the QEMU -code base is not
> as
> advanced and polished as the one you currently have and i would like to know
> if
> you would be interested in trying to join our effort or collaborate in merging
> our solution. Or maybe allowing us to piggy back on your effort.
Yeah, we can unite our efforts for the upstream.
Especially clean interface for both non-RDMA/RDMA (qemu internal/qemu-kernel)
is important.
At the moment I have no clue to the requirement of RDMA postcopy and
your implementation.
"transparently integrating with the MMU at the OS level" sounds interesting.
thanks,
> Would you bee free to meet at any time next week ? ( from Tuesday to Friday)
>
> Ps: we would be open sourcing our project by the end of the month of November
> and the post copy is only a small part of the technology developed.
>
> .
>
>
> Regards
> Benoit
>
>
> On 30 October 2012 08:32, Isaku Yamahata <address@hidden> wrote:
>
> This is the v3 patch series of postcopy migration.
>
> The trees is available at
> git://github.com/yamahata/qemu.git qemu-postcopy-oct-30-2012
> git://github.com/yamahata/linux-umem.git linux-umem-oct-29-2012
>
> Major changes v2 -> v3:
> - implemented pre+post optimization
> - auto detection of postcopy by incoming side
> - using threads on destination instead of fork
> - using blocking io instead of select + non-blocking io loop
> - less memory overhead
> - various improvement and code simplification
> - kernel module name change umem -> uvmem to avoid name conflict.
>
> Patches organization:
> 1-2: trivial fixes
> 3-5: prepartion for threading. cherry-picked from migration tree
> 6-18: refactoring existing code and preparation
> 19-25: implement postcopy live migration itself (essential part)
> 26-35: optimization/heuristic for postcopy
>
> Usage
> =====
> You need load uvmem character device on the host before starting
> migration.
> Postcopy can be used for tcg and kvm accelarator. The implementation
> depend
> on only linux uvmem character device. But the driver dependent code is
> split
> into a file.
> I tested only host page size == guest page size case, but the
> implementation
> allows host page size != guest page size case.
>
> The following options are added with this patch series.
> - incoming part
> use -incoming as usual. Postcopy is automatically detected.
> example:
> qemu -incoming tcp:0:4444 -monitor stdio -machine accel=kvm
>
> - outging part
> options for migrate command
> migrate [-p [-n] [-m]] URI
> [<precopy count> [<prefault forward> [<prefault backword>]]]
>
> Newly added options/arguments
> -p: indicate postcopy migration
> -n: disable background transferring pages: This is for benchmark/
> debugging
> -m: move background transfer of postcopy mode
> <precopy count>: The number of precopy RAM scan before postcopy.
> default 0 (0 means no precopy)
> <prefault forward>: The number of forward pages which is sent with
> on-demand
> <prefault backward>: The number of backward pages which is sent with
> on-demand
>
> example:
> migrate -p -n tcp:<dest ip address>:4444
> migrate -p -n -m tcp:<dest ip address>:4444 42 42 0
>
>
> TODO
> ====
> - benchmark/evaluation
> - improve/optimization
> At the moment at least what I'm aware of is
> - pre+post case
> On desitnation side reading dirty bitmap would cause long latency.
> create thread for that.
> - consider on FUSE/CUSE possibility
>
> basic postcopy work flow
> ========================
> qemu on the destination
> |
> V
> open(/dev/uvmem)
> |
> V
> UVMEM_INIT
> |
> V
> Here we have two file descriptors to
> umem device and shmem file
> |
> | umem threads
> | on the destination
> |
> V create pipe to communicate
> crete threads--------------------------------,
> | |
> V mmap(shmem file)
> mmap(uvmem device) for guest RAM close(shmem file)
> | |
> | |
> V |
> wait for ready from daemon <----pipe-----send ready message
> | |
> | Here the daemon takes over
> send ok------------pipe---------------> the owner of the socket
> | to the source
> V |
> entering post copy stage |
> start guest execution |
> | |
> V V
> access guest RAM read() to get faulted
> pages
> | |
> V V
> page fault ------------------------------>page offset is returned
> block |
> V
> pull page from the
> source
> write the page contents
> to the shmem.
> |
> V
> unblock <-----------------------------write() to tell served
> pages
> the fault handler returns the page |
> page fault is resolved |
> | V
> | touch guest RAM pages
> | |
> | V
> | release the cached page
> | madvise(MADV_REMOVE)
> |
> |
> | pages can be sent
> | backgroundly
> | |
> | V
> | mark page is cached
> | Thus future page fault
> is
> | avoided.
> | |
> | V
> | touch guest RAM pages
> | |
> | V
> | release the cached page
> | madvise(MADV_REMOVE)
> | |
> V V
>
> all the pages are pulled from the source
>
> | |
> V V
> migration completes exit()
>
>
> Isaku Yamahata (32):
> migration.c: remove redundant line in migrate_init()
> arch_init: DPRINTF format error and typo
> osdep: add qemu_read_full() to read interrupt-safely
> savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip,
> qemu_fflush
> savevm/QEMUFile: consolidate QEMUFile functions a bit
> savevm/QEMUFile: introduce qemu_fopen_fd
> savevm/QEMUFile: add read/write QEMUFile on memory buffer
> savevm, buffered_file: introduce method to drain buffer of buffered
> file
> arch_init: export RAM_SAVE_xxx flags for postcopy
> arch_init/ram_save: introduce constant for ram save version = 4
> arch_init: refactor ram_save_block() and export ram_save_block()
> arch_init/ram_save_setup: factor out bitmap alloc/free
> arch_init/ram_load: refactor ram_load
> arch_init: factor out logic to find ram block with id string
> migration: export migrate_fd_completed() and migrate_fd_cleanup()
> uvmem.h: import Linux uvmem.h and teach update-linux-headers.sh
> osdep: add QEMU_MADV_REMOVE and tirivial fix
> postcopy: introduce helper functions for postcopy
> savevm: add new section that is used by postcopy
> postcopy: implement incoming part of postcopy live migration
> postcopy outgoing: add -p option to migrate command
> postcopy: implement outgoing part of postcopy live migration
> postcopy/outgoing: add -n options to disable background transfer
> postcopy/outgoing: implement forward/backword prefault
> arch_init: factor out setting last_block, last_offset
> postcopy/outgoing: add movebg mode(-m) to migration command
> arch_init: factor out ram_load
> arch_init: export ram_save_iterate()
> postcopy: pre+post optimization incoming side
> arch_init: export migration_bitmap_sync and helper method to get
> bitmap
> postcopy/outgoing: introduce precopy_count parameter
> postcopy: pre+post optimization outgoing side
>
> Paolo Bonzini (1):
> split MRU ram list
>
> Umesh Deshpande (2):
> add a version number to ram_list
> protect the ramlist with a separate mutex
>
> Makefile.target | 2 +
> arch_init.c | 391 +++++---
> arch_init.h | 24 +
> buffered_file.c | 59 +-
> buffered_file.h | 1 +
> cpu-all.h | 16 +-
> exec.c | 62 +-
> hmp-commands.hx | 21 +-
> hmp.c | 12 +-
> linux-headers/linux/uvmem.h | 41 +
> migration-exec.c | 8 +-
> migration-fd.c | 23 +-
> migration-postcopy.c | 2019
> +++++++++++++++++++++++++++++++++++++++
> migration-tcp.c | 16 +-
> migration-unix.c | 36 +-
> migration.c | 65 +-
> migration.h | 42 +
> osdep.c | 24 +
> osdep.h | 13 +-
> qapi-schema.json | 6 +-
> qemu-common.h | 2 +
> qemu-file.h | 12 +-
> qmp-commands.hx | 4 +-
> savevm.c | 223 ++++-
> scripts/update-linux-headers.sh | 2 +-
> sysemu.h | 2 +-
> umem.c | 291 ++++++
> umem.h | 88 ++
> vl.c | 5 +-
> 29 files changed, 3265 insertions(+), 245 deletions(-)
> create mode 100644 linux-headers/linux/uvmem.h
> create mode 100644 migration-postcopy.c
> create mode 100644 umem.c
> create mode 100644 umem.h
>
> --
> 1.7.10.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to address@hidden
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
>
> --
> " The production of too many useful things results in too many useless people"
--
yamahata
- [Qemu-devel] [PATCH v3 30/35] arch_init: factor out ram_load, (continued)
- [Qemu-devel] [PATCH v3 30/35] arch_init: factor out ram_load, Isaku Yamahata, 2012/10/30
- [Qemu-devel] [PATCH v3 22/35] savevm: add new section that is used by postcopy, Isaku Yamahata, 2012/10/30
- [Qemu-devel] [PATCH v3 32/35] postcopy: pre+post optimization incoming side, Isaku Yamahata, 2012/10/30
- [Qemu-devel] [PATCH v3 21/35] postcopy: introduce helper functions for postcopy, Isaku Yamahata, 2012/10/30
- [Qemu-devel] [PATCH v3 35/35] postcopy: pre+post optimization outgoing side, Isaku Yamahata, 2012/10/30
- [Qemu-devel] [PATCH v3 27/35] postcopy/outgoing: implement forward/backword prefault, Isaku Yamahata, 2012/10/30
- [Qemu-devel] [PATCH v3 23/35] postcopy: implement incoming part of postcopy live migration, Isaku Yamahata, 2012/10/30
- [Qemu-devel] [PATCH v3 25/35] postcopy: implement outgoing part of postcopy live migration, Isaku Yamahata, 2012/10/30
- [Qemu-devel] [PATCH v3 26/35] postcopy/outgoing: add -n options to disable background transfer, Isaku Yamahata, 2012/10/30
- Re: [Qemu-devel] [PATCH v3 00/35] postcopy live migration, Benoit Hudzia, 2012/10/30
- Re: [Qemu-devel] [PATCH v3 00/35] postcopy live migration,
Isaku Yamahata <=
- Re: [Qemu-devel] [PATCH v3 00/35] postcopy live migration, Benoit Hudzia, 2012/10/30