qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 0/7] migration: pause-before-device


From: Daniel P. Berrange
Subject: Re: [Qemu-devel] [PATCH 0/7] migration: pause-before-device
Date: Thu, 12 Oct 2017 10:55:08 +0100
User-agent: Mutt/1.9.0 (2017-09-02)

On Thu, Oct 12, 2017 at 11:52:40AM +0200, Kevin Wolf wrote:
> Am 12.10.2017 um 11:27 hat Daniel P. Berrange geschrieben:
> > On Thu, Oct 12, 2017 at 11:18:31AM +0200, Kevin Wolf wrote:
> > > Am 12.10.2017 um 10:21 hat Daniel P. Berrange geschrieben:
> > > > On Wed, Oct 11, 2017 at 08:13:10PM +0100, Dr. David Alan Gilbert (git) 
> > > > wrote:
> > > > > From: "Dr. David Alan Gilbert" <address@hidden>
> > > > > 
> > > > > Hi,
> > > > >   This set attempts to make a race condition between migration and
> > > > > drive-mirror (and other block users) soluble by allowing the migration
> > > > > to be paused after the source qemu releases the block devices but
> > > > > before the serialisation of the device state.
> > > > > 
> > > > > The symptom of this failure, as reported by Wangjie, is a:
> > > > >    _co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed
> > > > > 
> > > > > and the source qemu dieing; so the problem is pretty nasty.
> > > > > This has only been seen on 2.9 onwards, but the theory is that
> > > > > prior to 2.9 it might have been happening anyway and we were
> > > > > perhaps getting unreported corruptions (lost writes); so this
> > > > > really needs fixing.
> > > > > 
> > > > > This flow came from discussions between Kevin and me, and we can't
> > > > > see a way of fixing it without exposing a new state to the management
> > > > > layer.
> > > > > 
> > > > > The flow is now:
> > > > > 
> > > > > (qemu) migrate_set_capability pause-before-device on
> > > > > (qemu) migrate -d ...
> > > > > (qemu) info migrate
> > > > > ...
> > > > > Migration status: pause-before-device
> > > > > ...
> > > > > << issue commands to clean up any block jobs>>
> > > > > 
> > > > > (qemu) migrate_continue pause-before-device
> > > > > (qemu) info migrate
> > > > > ...
> > > > > Migration status: completed
> > > > 
> > > > I'm curious why QEMU doesn't have enough info to clean up the block
> > > > jobs automatically ? What is the key thing that libvirt knows about
> > > > the block jobs, that QEMU is lacking ? If QEMU had the right info it
> > > > could do it automatically & avoid this extra lock-step synchronization
> > > > with libvirt.
> > > 
> > > The key point is that the block job needs to be completed while the
> > > source VM is stopped, but the source qemu is still in control of the
> > > image files (e.g. still holds the file locks), so that it can do the
> > > remaining writes.
> > > 
> > > Without the additional migration phase, the only state where both sides
> > > are stopped is when the destination is in control of the image files
> > > (migration has completed, but -S prevents it from automatically
> > > resuming), so the source can't write to the image any more.
> > 
> > Hmm, I always thought that the target QEMU did not start using the
> > image files until you ran 'cont' on the target. eg once source QEMU
> > has migrate=completed, both QEMUs are in paused state and source QEMU
> > still owns the images, until we run 'cont'.
> > 
> > What you're saying seems to imply this is not the case, but if so what
> > is triggering the target QEMU to acquire the locks on images ? Is it
> > done implicitly when it finishes reading device state off the wire ?
> > 
> > If so, could we instead add a migrate feature flag to tell the target
> > QEMU not to automatically acquire image locks, until it receives an
> > explicit 'cont'. That would then not require this extra lock-step
> > migration state.
> 
> The handover consists of two parts: The destination acquires the locks,
> but first the source needs to release them. Without a new command, the
> source can't know when it is supposed to do that. The destination
> receives the 'cont' command, but source doesn't know about this. So you
> have to have something that tells the source "management has made sure
> to complete what needed to be completed, you can now give up control of
> the images".
> 
> I also think that conceptually it is the cleanest to have a source
> controlled pre-handover phase with paused VM, which is only symmetrical
> to the existing post-handover phase that we have on the destination.
> This gives us a clean model for the handover of any resources that
> require some tearing down on the source before they can be used on the
> destination, so it appears to be the most future-proof option.

Ok, I see what you mean now.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



reply via email to

[Prev in Thread] Current Thread [Next in Thread]