[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] broken incoming migration

From: Peter Lieven
Subject: Re: [Qemu-devel] broken incoming migration
Date: Tue, 04 Jun 2013 12:56:19 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130510 Thunderbird/17.0.6

On 03.06.2013 12:04, Alexey Kardashevskiy wrote:
On 05/31/2013 12:38 AM, Peter Lieven wrote:

Am 30.05.2013 um 15:41 schrieb "Paolo Bonzini" <address@hidden>:

Il 30/05/2013 11:08, Peter Lieven ha scritto:
Am 30.05.2013 10:18, schrieb Alexey Kardashevskiy:
On 05/30/2013 05:49 PM, Paolo Bonzini wrote:
Il 30/05/2013 09:44, Alexey Kardashevskiy ha scritto:

I found the migration broken on pseries platform, specifically, this patch
broke it:

migration: do not sent zero pages in bulk stage

The idea is not to send zero pages to the destination guest which is
expected to have 100% empty RAM.

However on pseries plaftorm the guest always has some stuff in the RAM as a
part of initialization (device tree, system firmware and rtas (?)) so it is
not completely empty. As the source guest cannot detect this, it skips some
pages during migration and we get a broken destination guest. Bug.

While the idea is ok in general, I do not see any easy way to fix it as
neither QEMUMachine::init nor QEMUMachine::reset callbacks has information
about whether we are about to receive a migration or not (-incoming
parameter) and we cannot move device-tree and system firmware
initialization anywhere else.

ram_bulk_stage is static and cannot be disabled from the platform
initialization code.

So what would the community suggest?
Revert the patch. :)
I'll wait for 24 hours (forgot to cc: the author) and then post a revert
patch :)
does this problem only occur on pseries emulation?
Probably not.  On a PC, it would occur if you had 4K of zeros in the
source BIOS but not in the destination BIOS.  When you reboot, the BIOS
image is wrong.

not sending zero pages is not only a performance benefit it also makes
overcomitted memory usable. the madv_dontneed seems to kick in asynchronously
and memory is not available immediately.
You could also scan the page for nonzero values before writing it.
i had this in mind, but then choosed the other approach.... turned out to be a 
bad idea.

alexey: i will prepare a patch later today, could you then please verify it 
fixes your problem.

Yes I can, where is the patch? :)

its on my todo for today. sorry, have been a bit busy lately.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]