Re: [Qemu-devel] [PATCH 00/21][RFC] postcopy live migration

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 00/21][RFC] postcopy live migration

From:	Dor Laor
Subject:	Re: [Qemu-devel] [PATCH 00/21][RFC] postcopy live migration
Date:	Mon, 02 Jan 2012 11:28:49 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:9.0) Gecko/20111222 Thunderbird/9.0

On 01/01/2012 06:27 PM, Stefan Hajnoczi wrote:

On Sun, Jan 1, 2012 at 9:43 AM, Orit Wasserman<address@hidden>  wrote:

On 12/30/2011 12:39 AM, Anthony Liguori wrote:

On 12/28/2011 07:25 PM, Isaku Yamahata wrote:

Intro
=====
This patch series implements postcopy live migration.[1]
As discussed at KVM forum 2011, dedicated character device is used for
distributed shared memory between migration source and destination.
Now we can discuss/benchmark/compare with precopy. I believe there are
much rooms for improvement.

[1] http://wiki.qemu.org/Features/PostCopyLiveMigration


Usage
=====
You need load umem character device on the host before starting migration.
Postcopy can be used for tcg and kvm accelarator. The implementation depend
on only linux umem character device. But the driver dependent code is split
into a file.
I tested only host page size == guest page size case, but the implementation
allows host page size != guest page size case.

The following options are added with this patch series.
- incoming part
    command line options
    -postcopy [-postcopy-flags<flags>]
    where flags is for changing behavior for benchmark/debugging
    Currently the following flags are available
    0: default
    1: enable touching page request

    example:
    qemu -postcopy -incoming tcp:0:4444 -monitor stdio -machine accel=kvm

- outging part
    options for migrate command
    migrate [-p [-n]] URI
    -p: indicate postcopy migration
    -n: disable background transferring pages: This is for benchmark/debugging

    example:
    migrate -p -n tcp:<dest ip address>:4444


TODO
====
- benchmark/evaluation. Especially how async page fault affects the result.


I'll review this series next week (Mike/Juan, please also review when you can).

But we really need to think hard about whether this is the right thing to take 
into the tree.  I worry a lot about the fact that we don't test pre-copy 
migration nearly enough and adding a second form just introduces more things to 
test.

It's also not clear to me why post-copy is better.  If you were going to sit 
down and explain to someone building a management tool when they should use 
pre-copy and when they should use post-copy, what would you tell them?


Start with pre-copy , if it doesn't converge switch to post-copy


Post-copy throttles the guest when page faults are encountered because
the destination machine waits for memory pages from the source
machine.  Is there a reason this page fault-based throttling cannot be
done on the source machine with pre-copy migration?  I'm not sure
post-copy provides new behavior in terms of convergence, we could do
the same with pre-copy migration.


There is different w/ these two approaches:
1. post-copy allows progress to vcpus that are not faulting at the
   moment.

   Assuming a subset of the guest vcpu can execute freely w/ their
   memory already at the destination, they can get 100% cpu time.
   The slowing down approach on the source host, slows down all vcpus.

2. Difference page access pattern
   post-copy uses on-demand like paging, so the page that is really
   required get transferred. The slow-down approach can just guess what
   page to send first.


Post-copy has other advantages though, it immediately frees logical
CPUs on the source machine (though RAM and network bandwidth is still
required until migration completes).

W/ post-copy you can immediately free any page that got transferred tothe destination.

At the end of the day, it's performance testing using various scenariosthat can educate us whether post-copy worth the extra complexity overslowing down the guest on the source.


Cheers,
Dor


Stefan

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PATCH 00/21][RFC] postcopy live migration, Orit Wasserman, 2012/01/01
- Re: [Qemu-devel] [PATCH 00/21][RFC] postcopy live migration, Stefan Hajnoczi, 2012/01/01
  - Re: [Qemu-devel] [PATCH 00/21][RFC] postcopy live migration, Dor Laor <=
    - Re: [Qemu-devel] [PATCH 00/21][RFC] postcopy live migration, Stefan Hajnoczi, 2012/01/02
- Re: [Qemu-devel] [PATCH 00/21][RFC] postcopy live migration, Dor Laor, 2012/01/01
  - Re: [Qemu-devel] [PATCH 00/21][RFC] postcopy live migration, Takuya Yoshikawa, 2012/01/03
  - Re: [Qemu-devel] [PATCH 00/21][RFC] postcopy live migration, Michael Roth, 2012/01/03
- Re: [Qemu-devel] [PATCH 00/21][RFC] postcopy live migration, Isaku Yamahata, 2012/01/03

Prev by Date: Re: [Qemu-devel] [PATCH 4/4] simpletrace.py: Simpletrace v2 tracelog reader script
Next by Date: Re: [Qemu-devel] Better qemu/kvm defaults (was Re: [RFC PATCH 0/4] Gang scheduling in CFS)
Previous by thread: Re: [Qemu-devel] [PATCH 00/21][RFC] postcopy live migration
Next by thread: Re: [Qemu-devel] [PATCH 00/21][RFC] postcopy live migration
Index(es):
- Date
- Thread