[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PULL 15/28] migration: create new section to store glo

From: Christian Borntraeger
Subject: Re: [Qemu-devel] [PULL 15/28] migration: create new section to store global state
Date: Wed, 08 Jul 2015 12:54:01 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0

Am 08.07.2015 um 12:43 schrieb Dr. David Alan Gilbert:
> * Christian Borntraeger (address@hidden) wrote:
>> Am 08.07.2015 um 12:14 schrieb Dr. David Alan Gilbert:
>>> * Christian Borntraeger (address@hidden) wrote:
>>>> Am 07.07.2015 um 15:08 schrieb Juan Quintela:
>>>>> This includes a new section that for now just stores the current qemu 
>>>>> state.
>>>>> Right now, there are only one way to control what is the state of the
>>>>> target after migration.
>>>>> - If you run the target qemu with -S, it would start stopped.
>>>>> - If you run the target qemu without -S, it would run just after 
>>>>> migration finishes.
>>>>> The problem here is what happens if we start the target without -S and
>>>>> there happens one error during migration that puts current state as
>>>>> -EIO.  Migration would ends (notice that the error happend doing block
>>>>> IO, network IO, i.e. nothing related with migration), and when
>>>>> migration finish, we would just "continue" running on destination,
>>>>> probably hanging the guest/corruption data, whatever.
>>>>> Signed-off-by: Juan Quintela <address@hidden>
>>>>> Reviewed-by: Dr. David Alan Gilbert <address@hidden>
>>>> This is bisected to cause a regression on s390.
>>>> A guest restarts (booting) after managedsave/start instead of continuing.
>>>> Do you have any idea what might be wrong?
>>> I'd add some debug to the pre_save and post_load to see what state value is
>>> being saved/restored.
>>> Also, does that regression happen when doing the save/restore using the 
>>> same/latest
>>> git, or is it a load from an older version?
>> Seems to happen only with some guest definitions, but I cant really pinpoint 
>> it yet.
>> e.g. removing queues='4' from my network card solved it for a reduced xml, 
>> but
>> doing the same on a bigger xml was not enough :-/
> Nasty;  Still the 'paused' value in the pre-save/post-load feels right.
> I've read through the patch again and it still fells right to me, so I don't
> see anything obvious.
> Perhaps it's worth turning on the migration tracing on both sides and seeing 
> what's
> different with that 'queues=4' ?

Reducing the amount of virtio disks also seem to help. I am asking myself if
some devices use the runstate somehow and this change triggers a race.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]