qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd


From: Andrea Arcangeli
Subject: Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd
Date: Tue, 5 Jul 2016 16:59:04 +0200
User-agent: Mutt/1.6.1 (2016-04-27)

Hello,

On Tue, Jul 05, 2016 at 11:57:31AM +0200, Baptiste Reynal wrote:
> Ok, if it is not on Andrea schedule I am willing to take the action,
> at least for ARM/ARM64 support.

A few days ago I released this update:

https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/

git clone -b master --reference linux
git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git
cd aa
git fetch
git reset --hard origin/master

The branch will be constantly rebased so you will need to rebase or
reset on origin/master after a fetch to get the updates.


Features added:

1) WP support for anon (Shaohua, hugetlbfs has a FIXME)
2) non cooperative support (Pavel & Mike Rapoport)
3) hugetlbfs missing faults tracking (Mike Kravetz)

WP support and hugetlbfs required a couple of fixes, the
non-cooperative support is as submitted but I wonder if we should have
a single non cooperative feature flag.

I didn't advertise it yet because It's not well tested and in fact I
don't expect the WP mode to work fully as it should.

However the kernel should run stable, I fixed enough bugs so that this
release should not be possible to DoS or exploit the kernel with this
patchset applied (unlike the original code submits which had race
conditions and potentially kernel crashing bugs).

The next thing I plan to work on is a bitflag in the swap entry for
the WP tracking so that WP tracking works correctly through swapins
without false positives. It'll work like soft-dirty. Possible that
other things are still uncovered in the WP support.

THP should be covered now (the callback was missing in the original
submit but I fixed that). KVM it's not entirely clear why it didn't
work before but it may require changes to the KVM code if this is not
enough. KVM should not use gup(write=1) for read faults on shadow
pagetables, so it has at least a chance to work.

I'm also considering using a reserved bitflag in the mapped/present
pte/trans_huge_pmds to track which virtual addresses have been
wrprotected. Without a reserved bitflag, fork() would inevitably lead
to WP userfaults false positives. I'm not sure if it's required or if
it should be left up to userland to enforce the pagetables don't
become wrprotected (i.e. use MADV_DONTFORK like of course KVM already
does). First we've to solve the false positives through swap anyway,
the two should be orthogonal improvements.

If you could test the live snapshotting patchset on my kernel master
branch and report any issue or incremental fix against my branch, it'd
be great.

On my side I think I'll focus on testing by extending the testsuite
inside the kernel to exercise WP tracking too.

There are several other active users of the new userfaultfd features,
including JIT garbage collection (that previously used mprotect and
trapped SIGSEGV), distributed shared memory, SQL database robustness
in hugetlbfs holes and postcopy live migration of containers (a
process using userfaultfd of its own being live migrated inside a
containers with the non-cooperative model, isn't solved yet though).

Thanks,
Andrea



reply via email to

[Prev in Thread] Current Thread [Next in Thread]