qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PULL 15/28] migration: create new section to store glo


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] [PULL 15/28] migration: create new section to store global state
Date: Wed, 8 Jul 2015 12:14:23 +0100
User-agent: Mutt/1.5.23 (2014-03-12)

* Christian Borntraeger (address@hidden) wrote:
> Am 08.07.2015 um 12:43 schrieb Dr. David Alan Gilbert:
> > * Christian Borntraeger (address@hidden) wrote:
> >> Am 08.07.2015 um 12:14 schrieb Dr. David Alan Gilbert:
> >>> * Christian Borntraeger (address@hidden) wrote:
> >>>> Am 07.07.2015 um 15:08 schrieb Juan Quintela:
> >>>>> This includes a new section that for now just stores the current qemu 
> >>>>> state.
> >>>>>
> >>>>> Right now, there are only one way to control what is the state of the
> >>>>> target after migration.
> >>>>>
> >>>>> - If you run the target qemu with -S, it would start stopped.
> >>>>> - If you run the target qemu without -S, it would run just after 
> >>>>> migration finishes.
> >>>>>
> >>>>> The problem here is what happens if we start the target without -S and
> >>>>> there happens one error during migration that puts current state as
> >>>>> -EIO.  Migration would ends (notice that the error happend doing block
> >>>>> IO, network IO, i.e. nothing related with migration), and when
> >>>>> migration finish, we would just "continue" running on destination,
> >>>>> probably hanging the guest/corruption data, whatever.
> >>>>>
> >>>>> Signed-off-by: Juan Quintela <address@hidden>
> >>>>> Reviewed-by: Dr. David Alan Gilbert <address@hidden>
> >>>>
> >>>> This is bisected to cause a regression on s390.
> >>>>
> >>>> A guest restarts (booting) after managedsave/start instead of continuing.
> >>>>
> >>>> Do you have any idea what might be wrong?
> >>>
> >>> I'd add some debug to the pre_save and post_load to see what state value 
> >>> is
> >>> being saved/restored.
> >>>
> >>> Also, does that regression happen when doing the save/restore using the 
> >>> same/latest
> >>> git, or is it a load from an older version?
> >>
> >> Seems to happen only with some guest definitions, but I cant really 
> >> pinpoint it yet.
> >> e.g. removing queues='4' from my network card solved it for a reduced xml, 
> >> but
> >> doing the same on a bigger xml was not enough :-/
> > 
> > Nasty;  Still the 'paused' value in the pre-save/post-load feels right.
> > I've read through the patch again and it still fells right to me, so I don't
> > see anything obvious.
> > 
> > Perhaps it's worth turning on the migration tracing on both sides and 
> > seeing what's
> > different with that 'queues=4' ?
> 
> Reducing the amount of virtio disks also seem to help. I am asking myself if
> some devices use the runstate somehow and this change triggers a race.

I'm not sure why it would make a difference, but...
The difference this patch makes is that in the 'paused' state the state of the 
VM is
set to 'paused' before all of the other devices have finished loading their 
state.
In the old case it would only be transitioned to pause at the end after
all the other devices have loaded.

Dave
> 
> 
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]