Re: [Qemu-devel] Live migration without bdrv_drain

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Live migration without bdrv_drain_all()

From:	Daniel P. Berrange
Subject:	Re: [Qemu-devel] Live migration without bdrv_drain_all()
Date:	Tue, 27 Sep 2016 10:48:48 +0100
User-agent:	Mutt/1.7.0 (2016-08-17)

On Mon, Aug 29, 2016 at 11:06:48AM -0400, Stefan Hajnoczi wrote:
> At KVM Forum an interesting idea was proposed to avoid
> bdrv_drain_all() during live migration.  Mike Cui and Felipe Franciosi
> mentioned running at queue depth 1.  It needs more thought to make it
> workable but I want to capture it here for discussion and to archive
> it.
> 
> bdrv_drain_all() is synchronous and can cause VM downtime if I/O
> requests hang.  We should find a better way of quiescing I/O that is
> not synchronous.  Up until now I thought we should simply add a
> timeout to bdrv_drain_all() so it can at least fail (and live
> migration would fail) if I/O is stuck instead of hanging the VM.  But
> the following approach is also interesting...

How would you decide what an acceptable timeout is for the drain
operation ? At what point does a stuck drain op cause the VM
to stall ?  The drain call happens from the migration thread, so
it shouldn't impact vcpu threads or the main event loop thread
if it takes too long.

> 
> During the iteration phase of live migration we could limit the queue
> depth so points with no I/O requests in-flight are identified.  At
> these points the migration algorithm has the opportunity to move to
> the next phase without requiring bdrv_drain_all() since no requests
> are pending.
> 
> Unprocessed requests are left in the virtio-blk/virtio-scsi virtqueues
> so that the destination QEMU can process them after migration
> completes.
> 
> Unfortunately this approach makes convergence harder because the VM
> might also be dirtying memory pages during the iteration phase.  Now
> we need to reach a spot where no I/O is in-flight *and* dirty memory
> is under the threshold.

It doesn't seem like this could easily fit in with post-copy. During
the switchover from pre-copy to post-copy migration calls vm_stop_force_state
which will trigger bdrv_drain_all().

The point at which you switch from pre to post copy mode is not controlled
by QEMU, instead it is an explicit admin action triggered via a QMP command.
Now the actual switch over is not synchronous with completion of the QMP
command, so there is small scope for delaying it to a convenient time, but
not by a very significant amount & certainly not anywhere near 30 seconds.
Perhaps 1 second at the most.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] Live migration without bdrv_drain_all(), Stefan Hajnoczi, 2016/09/27
- Re: [Qemu-devel] Live migration without bdrv_drain_all(), Daniel P. Berrange, 2016/09/27
- Re: [Qemu-devel] Live migration without bdrv_drain_all(), Dr. David Alan Gilbert, 2016/09/27
  - Re: [Qemu-devel] Live migration without bdrv_drain_all(), Juan Quintela, 2016/09/28
    - Re: [Qemu-devel] Live migration without bdrv_drain_all(), Daniel P. Berrange, 2016/09/28
    - Re: [Qemu-devel] Live migration without bdrv_drain_all(), Felipe Franciosi, 2016/09/28
- Re: [Qemu-devel] Live migration without bdrv_drain_all(), Daniel P. Berrange <=

Prev by Date: Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/4] libqos: add PCI management in qtest_vboot()/qtest_shutdown()
Next by Date: Re: [Qemu-devel] Live migration without bdrv_drain_all()
Previous by thread: Re: [Qemu-devel] Live migration without bdrv_drain_all()
Next by thread: [Qemu-devel] [PATCH v2] block: Turn on "unmap" in active commit
Index(es):
- Date
- Thread