qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 3/3] migration/postcopy: handle POSTCOPY_INCOMING_RUNNING cor


From: Peter Xu
Subject: Re: [PATCH 3/3] migration/postcopy: handle POSTCOPY_INCOMING_RUNNING corner case properly
Date: Wed, 9 Oct 2019 12:12:25 +0800
User-agent: Mutt/1.11.4 (2019-03-13)

On Wed, Oct 09, 2019 at 09:02:04AM +0800, Wei Yang wrote:
> On Tue, Oct 08, 2019 at 05:40:46PM +0100, Dr. David Alan Gilbert wrote:
> >* Wei Yang (address@hidden) wrote:
> >> Currently, we set PostcopyState blindly to RUNNING, even we found the
> >> previous state is not LISTENING. This will lead to a corner case.
> >> 
> >> First let's look at the code flow:
> >> 
> >> qemu_loadvm_state_main()
> >>     ret = loadvm_process_command()
> >>         loadvm_postcopy_handle_run()
> >>             return -1;
> >>     if (ret < 0) {
> >>         if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING)
> >>             ...
> >>     }
> >> 
> >> From above snippet, the corner case is loadvm_postcopy_handle_run()
> >> always sets state to RUNNING. And then it checks the previous state. If
> >> the previous state is not LISTENING, it will return -1. But at this
> >> moment, PostcopyState is already been set to RUNNING.
> >> 
> >> Then ret is checked in qemu_loadvm_state_main(), when it is -1
> >> PostcopyState is checked. Current logic would pause postcopy and retry
> >> if PostcopyState is RUNNING. This is not what we expect, because
> >> postcopy is not active yet.
> >> 
> >> This patch makes sure state is set to RUNNING only previous state is
> >> LISTENING by introducing an old_state parameter in postcopy_state_set().
> >> New state only would be set when current state equals to old_state.
> >> 
> >> Signed-off-by: Wei Yang <address@hidden>
> >
> >OK, it's a shame to use a pointer there, but it works.
> 
> You mean second parameter of postcopy_state_set()?
> 
> I don't have a better idea. Or we introduce a new state
> POSTCOPY_INCOMING_NOCHECK. Do you feel better with this?

Maybe simply fix loadvm_postcopy_handle_run() to set the state after
the POSTCOPY_INCOMING_LISTENING check?

> 
> >Note, something else; using '-1' as the return value and checking for it
> >is something we do a lot; but in this case it's an example of an error
> >we could never recover from so it never makes sense to try and recover.
> >We should probably look at different types of error.

It is true that we might hang on some real errors, but IMHO it might
be no where better to quit QEMU if we're in postcopy...

(What I'm thinking in mind here is that sometimes even if postcopy
 failed we might still have a chance to recover a full VM by dumping
 both src/dst of the during-postcopy VM instances and manually merge
 them by black magic, in very extreme cases)

Regards,

-- 
Peter Xu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]