[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] post-copy is broken?

From: Andrea Arcangeli
Subject: Re: [Qemu-devel] post-copy is broken?
Date: Wed, 27 Apr 2016 16:47:39 +0200
User-agent: Mutt/1.6.0 (2016-04-01)

Hello Liang,

On Mon, Apr 18, 2016 at 10:33:14AM +0000, Li, Liang Z wrote:
> If the THP is disabled, no fails.
> And your test was always passed, even when  real post-copy was failed. 
> In my env, the output of 
> 'cat /sys/kernel/mm/transparent_hugepage/enabled'  is:
>  [always] ...

Can you test the fix?

This was not a breakage in userfaultfd nor in postcopy. userfaultfd
had no bugs and is fully rock solid and with zero chances of
generating undetected memory corruption like it was happening in v4.5.

As I suspected, the same problem would have happened with any THP
pmd_trans_huge split (swapping/inflating-balloon etc..). Postcopy just
makes it easier to reproduce the problem because it does a scattered
MADV_DONTNEED on the destination qemu guest memory for the pages
redirtied during the last precopy pass that run, or not transferred
(to allow THP faults in destination qemu during precopy), just before
starting the guest in the destination node.

Other reports of KVM memory corruption happening on v4.5 with THP
enabled will also be taken care of by the above fix.

I hope I managed to fix this in time for v4.6 final (current is
v4.6-rc5-69), so the only kernel where KVM must not be used with THP
enabled will be v4.5.

On a side note, this MADV_DONTEED trigger reminded me as soon as the
madvisev syscall is merged, loadvm_postcopy_ram_handle_discard should
start using it to reduce the enter/exit kernel to just 1 (or a few
madvisev in case we want to give a limit to the temporary buffer to
avoid the risk of allocating too much temporary RAM for very large
guests) to do the MADV_DONTNEED scattered zapping. Same thing in


reply via email to

[Prev in Thread] Current Thread [Next in Thread]