qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] post-copy is broken?


From: Andrea Arcangeli
Subject: Re: [Qemu-devel] post-copy is broken?
Date: Thu, 14 Apr 2016 12:22:30 -0400
User-agent: Mutt/1.6.0 (2016-04-01)

Adding linux-mm too,

On Thu, Apr 14, 2016 at 01:34:41PM +0100, Dr. David Alan Gilbert wrote:
> * Andrea Arcangeli (address@hidden) wrote:
> 
> > The next suspect is the massive THP refcounting change that went
> > upstream recently:
> 
> > As further debug hint, can you try to disable THP and see if that
> > makes the problem go away?
> 
> Yep, this seems to be the problem (cc'ing in Kirill).
> 
> 122afea9626ab3f717b250a8dd3d5ebf57cdb56c - works (just before Kirill disables 
> THP)
> 61f5d698cc97600e813ca5cf8e449b1ea1c11492 - breaks (when THP is reenabled)
> 
> It's pretty reliable; as you say disabling THP makes it work again
> and putting it back to THP/madvise mode makes it break.  And you need
> to test on a machine with some free ram to make sure THP has a chance
> to have happened.
> 
> I'm not sure of all of the rework that happened in that series,
> but my reading of it is that splitting of THP pages gets deferred;
> so I wonder if when I do the madvise to turn THP off, if it's actually
> still got THP pages and thus we end up with a whole THP mapped
> when I'm expecting to be userfaulting those pages.

Good thing at least I didn't make UFFDIO_COPY THP aware yet so there's
less variables (as no user was interested to handle userfaults at THP
granularity yet, and from userland such an improvement would be
completely invisible in terms of API, so if an user starts doing that
we can just optimize the kernel for it, criu restore could do that as
the faults will come from disk-I/O, when network is involved THP
userfaults wouldn't have a great tradeoff with regard to the increased
fault latency).

I suspect there is an handle_userfault missing somewhere in connection
with trans_huge_pmd splits (not anymore THP splits) that you're doing
with MADV_DONTNEED to zap those pages in the destination that got
redirtied in source during the last precopy stage. Or more simply
MADV_DONTNEED isn't zapping all the right ptes after the trans huge
pmd got splitted.

The fact the page isn't splitted shouldn't matter too much, all we care
about is the pte triggers handle_userfault after MADV_DONTNEED.

The userfaultfd testcase in the kernel isn't exercising this case
unfortunately, that should probably be improved too, so there is a
simpler way to reproduce than running precopy before postcopy in qemu.

Thanks,
Andrea



reply via email to

[Prev in Thread] Current Thread [Next in Thread]