qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] migration: invalidate cache before source start


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] [PATCH] migration: invalidate cache before source start
Date: Tue, 26 Jun 2018 20:11:51 +0100
User-agent: Mutt/1.10.0 (2018-05-17)

* Vladimir Sementsov-Ogievskiy (address@hidden) wrote:
> 25.06.2018 21:03, John Snow wrote:
> > 
> > On 06/25/2018 01:50 PM, Dr. David Alan Gilbert wrote:
> > > * Dr. David Alan Gilbert (address@hidden) wrote:
> > > > * Vladimir Sementsov-Ogievskiy (address@hidden) wrote:
> > > > > 15.06.2018 15:06, Dr. David Alan Gilbert wrote:
> > > > > > * Vladimir Sementsov-Ogievskiy (address@hidden) wrote:
> > > > > > > Invalidate cache before source start in case of failed migration.
> > > > > > > 
> > > > > > > Signed-off-by: Vladimir Sementsov-Ogievskiy <address@hidden>
> > > > > > Why doesn't the code at the bottom of migration_completion,
> > > > > > fail_invalidate:   and the code in migrate_fd_cancel   handle this?
> > > > > > 
> > > > > > What case did you see it in that those didn't handle?
> > > > > > (Also I guess it probably should set s->block_inactive = false)
> > > > > on source I see:
> > > > > 
> > > > > address@hidden:migrate_set_state new state 7
> > > > > address@hidden:migration_thread_file_err
> > > > > address@hidden:migration_thread_after_loop
> > > > > 
> > > > > so, we are leaving loop on
> > > > >          if (qemu_file_get_error(s->to_dst_file)) {
> > > > >              migrate_set_state(&s->state, current_active_state,
> > > > > MIGRATION_STATUS_FAILED);
> > > > > trace_migration_thread_file_err();
> > > > > break;
> > > > >          }
> > > > > 
> > > > > and skip migration_completion()
> > > > Yeh, OK; I'd seen soemthing else a few days ago, where a cancellation
> > > > test that had previously ended with a 'cancelled' state has now ended up
> > > > in 'failed' (which is the state 7 you have above).
> > > > I suspect there's something else going on as well; I think what is
> > > > supposed to happen in the case of 'cancel' is that it spins in 
> > > > 'cancelling' for
> > > > a while in migrate_fd_cancel and then at the bottom of migrate_fd_cancel
> > > > it does the recovery, but because it's going to failed instead, then
> > > > it's jumping over that recovery.
> > > Going back and actually looking at the patch again;
> > > can I ask for 1 small change;
> > >     Can you set s->block_inactive = false   in the case where you
> > > don't get the local_err (Like we do at the bottom of migrate_fd_cancel)
> > > 
> > > 
> > > Does that make sense?
> > > 
> > > Thanks,
> > > 
> > > Dave
> > > 
> > Vladimir, one more question for you because I'm not as familiar with
> > this code:
> > 
> > In the normal case we need to invalidate the qcow2 cache as a way to
> > re-engage the disk (yes?) when we have failed during the late-migration
> > steps.
> > 
> > In this case, we seem to be observing a failure during the bulk transfer
> > loop. Why is it important to invalidate the cache at this step -- would
> > the disk have been inactivated yet? It shouldn't, because it's in the
> > bulk transfer phase -- or am I missing something?
> > 
> > I feel like this code is behaving in a way that's fairly surprising for
> > a casual reader so I was hoping you could elaborate for me.
> > 
> > --js
> 
> In my case, source is already in postcopy state, when error occured, so it
> is inactivated.

Ah, that explains why I couldn't understand the path that got you there;
I never think about restarting the source once we're in postcopy -
because once the destination is running all is lost.
But, you might be in the gap efore management has actually started
the destination so it does need fixing.

Dave

> -- 
> Best regards,
> Vladimir
> 
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]