qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] cache=writeback and migrations over shared storage


From: Filippos Giannakos
Subject: Re: [Qemu-devel] cache=writeback and migrations over shared storage
Date: Thu, 3 Oct 2013 11:10:47 +0300
User-agent: Mutt/1.5.21 (2010-09-15)

On Thu, Sep 26, 2013 at 09:31:00AM +0200, Stefan Hajnoczi wrote:
> Hi Filippos,
> Late response but this may help start the discussion...
> 
> Cache consistency during migration was discussed a lot on the mailing
> list.  You might be able to find threads from about 2 years ago that
> discuss this in detail.
> 
> Here is what I remember:
> 
> During migration the QEMU process on the destination host must be
> started.  When QEMU starts up it opens the image file and reads the
> first sector (for disk geometry and image format probing).  At this
> point the destination would populate its page cache while the source is
> still running the guest.
> 
> We're in trouble because the destination host has stale pages in its
> page cache.  Hence the recommendation to use cache=none.
> 
> There are a few things to look at if you are really eager to use
> cache=writeback:
> 
> 1. Can you avoid geometry probing?  I think by setting the geometry
>    options on the -drive you can skip probing.  See
>    hw/block/hd-geometry.c.
> 
> 2. Can you avoid format probing?  Use -drive format=raw to skip format
>    probing.
> 
> 3. Make sure to use raw image files.  Do not use a format since that
>    would require reading a header and metadata before migration
>    handover.
> 
> 4. Check if ioctl(BLKFLSBUF) can be used.  Unfortunately it requires
>    CAP_SYS_ADMIN so the QEMU process cannot issue it when running
>    without privileges.  Perhaps an external tool like libvirt could
>    issue it, but that's tricky since live migration handover is a
>    delicate operation - it's important to avoided dependencies between
>    multiple processes to keep guest downtime low and avoid possibility
>    of failures.
> 
> So you might be able to get away with cache=writeback *if* you carefully
> study the code and double-check with strace that the destination QEMU
> processes does not access the image file before handover has completed.
> 
> Stefan

Hi Stefan,

Thanx for your response. You've been really helpful.

I believe it should be possible to use writeback cache, if we address the
problems you pointed out.

I 'll give it a try by providing disk geometry and by using the raw drive
format. I don't thing I need to use ioctl(BLKFLSBUF). As far as I can tell by
looking briefly the kernel code, I see that it flushes the dirty pages of the
device and invalidates the now clean pages. Since a) the source QEMU process
flushes all block devices before handing control over to the destination process
and b) we remove the block device after the migration, this should happen
automatically.

To be extra sure about this, I 'll run a modified qemu version to pause
execution of the source hypervisor after the vm is stopped in the source and
right before it hands control to the destination. I believe that this window
exists after:

        vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);

and right before:

        qemu_savevm_state_complete(s->file);

in the migration_thread in migration.c .

I 'll be strace-ing the destination to make sure there are no reads issued on
the block device. Plus our custom storage layer allows us to monitor all I/O
requests that are performed to the block device, so we have another layer that
ensures that no data were read from the storage.

Kind Regards,
-- 
Filippos
<address@hidden>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]