qemu-stable
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON


From: Michael S. Tsirkin
Subject: Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT
Date: Fri, 9 Jul 2021 07:27:28 -0400

On Thu, Jul 08, 2021 at 08:07:44PM +0100, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (mst@redhat.com) wrote:
> > On Wed, Jul 07, 2021 at 09:47:31PM +0200, David Hildenbrand wrote:
> > > On 07.07.21 21:19, Michael S. Tsirkin wrote:
> > > > On Wed, Jul 07, 2021 at 09:14:00PM +0200, David Hildenbrand wrote:
> > > > > On 07.07.21 21:05, Michael S. Tsirkin wrote:
> > > > > > On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote:
> > > > > > > Postcopy never worked properly with 'free-page-hint=on', as there 
> > > > > > > are
> > > > > > > at least two issues:
> > > > > > > 
> > > > > > > 1) With postcopy, the guest will never receive a 
> > > > > > > VIRTIO_BALLOON_CMD_ID_DONE
> > > > > > >      and consequently won't release free pages back to the OS once
> > > > > > >      migration finishes.
> > > > > > > 
> > > > > > >      The issue is that for postcopy, we won't do a final bitmap 
> > > > > > > sync while
> > > > > > >      the guest is stopped on the source and
> > > > > > >      virtio_balloon_free_page_hint_notify() will only call
> > > > > > >      virtio_balloon_free_page_done() on the source during
> > > > > > >      PRECOPY_NOTIFY_CLEANUP, after the VM state was already 
> > > > > > > migrated to
> > > > > > >      the destination.
> > > > > > > 
> > > > > > > 2) Once the VM touches a page on the destination that has been 
> > > > > > > excluded
> > > > > > >      from migration on the source via qemu_guest_free_page_hint() 
> > > > > > > while
> > > > > > >      postcopy is active, that thread will stall until postcopy 
> > > > > > > finishes
> > > > > > >      and all threads are woken up. (with older Linux kernels that 
> > > > > > > won't
> > > > > > >      retry faults when woken up via userfaultfd, we might 
> > > > > > > actually get a
> > > > > > >      SEGFAULT)
> > > > > > > 
> > > > > > >      The issue is that the source will refuse to migrate any 
> > > > > > > pages that
> > > > > > >      are not marked as dirty in the dirty bmap -- for example, 
> > > > > > > because the
> > > > > > >      page might just have been sent. Consequently, the faulting 
> > > > > > > thread will
> > > > > > >      stall, waiting for the page to be migrated -- which could 
> > > > > > > take quite
> > > > > > >      a while and result in guest OS issues.
> > > > > > 
> > > > > > OK so if source gets a request for a page which is not dirty
> > > > > > it does not respond immediately? Why not just teach it to
> > > > > > respond? It would seem that if destination wants a page we
> > > > > > should just give it to the destination ...
> > > > > 
> > > > > The source does not know if a page has already been sent (e.g., via 
> > > > > the
> > > > > background migration thread that moves all data over) vs. the page 
> > > > > has not
> > > > > been send because the page was hinted. This is the part where we'd 
> > > > > need
> > > > > additional tracking on the source to actually know that.
> > > > > 
> > > > > We must not send a page twice, otherwise bad things can happen when 
> > > > > placing
> > > > > pages that already have been migrated, because that scenario can 
> > > > > easily
> > > > > happen with ordinary postcopy (page has already been sent and we're 
> > > > > dealing
> > > > > with a stale request from the destination).
> > > > 
> > > > OK let me get this straight
> > > > 
> > > > A. source sends page
> > > > B. destination requests page
> > > > C. destination changes page
> > > > D. source sends page
> > > > E. destination overwrites page
> > > > 
> > > > this is what you are worried about right?
> > > 
> > > IIRC E. is with recent kernels:
> > > 
> > > E. placing the page fails with -EEXIST and postcopy migration fails
> > > 
> > > However, the man page (man ioctl_userfaultfd) doesn't describe what is
> > > actually supposed to happen when double-placing. Could be that it's
> > > "undefined behavior".
> > > 
> > > I did not try, though.
> > > 
> > > 
> > > This is how it works today:
> > > 
> > > A. source sends page and marks it clean
> > > B. destination requests page
> > > C. destination receives page and places it
> > > D. source ignores request as page is clean
> > 
> > If it's actually -EEXIST then we could just resend it
> > and teach destination to ignore -EEXIST errors right?
> > 
> > Will actually make things a bit more robust: destination
> > handles its own consistency instead of relying on source.
> 
> You have to be careful of a few things;
>   a) If the destination has modified the page, then you must
> definitely not under any circumstances lose those modifications
> and replace them by an old version from the source.
> 
>   b) With postcopy recovery I think there is a bitmap to track some
> of this; but you have to be careful since you don't know whether
> pages that were sent were actually received.
> 
> Dave

what I am trying to say is that userfaultfd already tracks these
things in the kernel for us. Ideally we'd just use that ...

> > 
> > 
> > > > 
> > > > the fix is to mark page clean in A.
> > > > then in D to not send page if it's clean?
> > > > 
> > > > And the problem with hinting is this:
> > > > 
> > > > A. page is marked clean
> > > > B. destination requests page
> > > > C. destination changes page
> > > > D. source sends page <- does not happen, page is clean!
> > > > E. destination overwrites page
> > > 
> > > Simplified it's
> > > 
> > > A. page is marked clean by hinting code
> > > B. destination requests page
> > > D. source ignores request as page is clean
> > > E. destination stalls until postcopy unregisters uffd
> > > 
> > > 
> > > Some thoughts
> > > 
> > > 1. We do have a a recv bitmap where we track received pages on the
> > > destination (e.g., ramblock_recv_bitmap_test()), however we only use it to
> > > avoid sending duplicate requests to the hypervisor AFAIKs, and don't check
> > > it when placing pages.
> > > 
> > > 2. Changing the migration behavior unconditionally on the source will 
> > > break
> > > migration to old QEMU binaries that cannot handle this change.
> > 
> > We can always make this depend on new machine types.
> > 
> > 
> > > 3. I think the current behavior is in place to make debugging easier. If
> > > only a single instance of a page will ever be migrated from source to
> > > destination, there cannot be silent data corruption. Further, we avoid
> > > migrating unnecessarily pages twice.
> > > 
> > 
> > Likely does not matter much for performance, it seems unlikely that
> > the race is all that common.
> > 
> > > Maybe Dave and Peter can spot any flaws in my understanding.
> > > 
> > > -- 
> > > Thanks,
> > > 
> > > David / dhildenb
> > 
> -- 
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK




reply via email to

[Prev in Thread] Current Thread [Next in Thread]