qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 1/2] block/file-posix: implement bdrv_co_invalidat


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [RFC 1/2] block/file-posix: implement bdrv_co_invalidate_cache() on Linux
Date: Fri, 20 Apr 2018 11:21:38 +0800
User-agent: Mutt/1.9.2 (2017-12-15)

On Thu, Apr 19, 2018 at 10:18:33AM +0100, Dr. David Alan Gilbert wrote:
> * Stefan Hajnoczi (address@hidden) wrote:
> > On Linux posix_fadvise(POSIX_FADV_DONTNEED) invalidates pages*.  Use
> > this to drop page cache on the destination host during shared storage
> > migration.  This way the destination host will read the latest copy of
> > the data and will not use stale data from the page cache.
> > 
> > The flow is as follows:
> > 
> > 1. Source host writes out all dirty pages and inactivates drives.
> > 2. QEMU_VM_EOF is sent on migration stream.
> > 3. Destination host invalidates caches before accessing drives.
> > 
> > This patch enables live migration even with -drive cache.direct=off.
> > 
> > * Terms and conditions may apply, please see patch for details.
> > 
> > Signed-off-by: Stefan Hajnoczi <address@hidden>
> > ---
> >  block/file-posix.c | 39 +++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 39 insertions(+)
> > 
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index 3794c0007a..df4f52919f 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -2236,6 +2236,42 @@ static int coroutine_fn 
> > raw_co_block_status(BlockDriverState *bs,
> >      return ret | BDRV_BLOCK_OFFSET_VALID;
> >  }
> >  
> > +static void coroutine_fn raw_co_invalidate_cache(BlockDriverState *bs,
> > +                                                 Error **errp)
> > +{
> > +    BDRVRawState *s = bs->opaque;
> > +    int ret;
> > +
> > +    ret = fd_open(bs);
> > +    if (ret < 0) {
> > +        error_setg_errno(errp, -ret, "The file descriptor is not open");
> > +        return;
> > +    }
> > +
> > +    if (s->open_flags & O_DIRECT) {
> > +        return; /* No host kernel page cache */
> > +    }
> > +
> > +#if defined(__linux__)
> > +    /* This sets the scene for the next syscall... */
> > +    ret = bdrv_co_flush(bs);
> > +    if (ret < 0) {
> > +        error_setg_errno(errp, -ret, "flush failed");
> > +        return;
> > +    }
> > +
> > +    /* Linux does not invalidate pages that are dirty, locked, or mmapped 
> > by a
> > +     * process.  These limitations are okay because we just fsynced the 
> > file,
> > +     * we don't use mmap, and the file should not be in use by other 
> > processes.
> > +     */
> > +    ret = posix_fadvise(s->fd, 0, 0, POSIX_FADV_DONTNEED);
> 
> What happens if I try a migrate between two qemu's on the same host?
> (Which I, and avocado, both use for testing; I think think users
> occasionally do for QEMU updates).

The steps quoted from the commit description:

  1. Source host writes out all dirty pages and inactivates drives.
  2. QEMU_VM_EOF is sent on migration stream.
  3. Destination host invalidates caches before accessing drives.

When we reach Step 3 the source QEMU is not doing I/O (no pages are
locked).  The destination QEMU does bdrv_co_flush() so even if pages are
still dirty (that shouldn't happen since the source already drained and
flushed) they will be written out and pages will be clean.  Therefore
fadvise really invalidates all resident pages.

FWIW when writing this patch I tested with both QEMUs on the same host.

Stefan

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]