qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Re: [PATCH 0/6] Save state error handling (kill off no_migr


From: Alex Williamson
Subject: [Qemu-devel] Re: [PATCH 0/6] Save state error handling (kill off no_migrate)
Date: Mon, 08 Nov 2010 14:23:37 -0700

On Mon, 2010-11-08 at 22:59 +0200, Michael S. Tsirkin wrote:
> On Mon, Nov 08, 2010 at 10:20:46AM -0700, Alex Williamson wrote:
> > On Mon, 2010-11-08 at 18:54 +0200, Michael S. Tsirkin wrote:
> > > On Mon, Nov 08, 2010 at 07:59:57AM -0700, Alex Williamson wrote:
> > > > On Mon, 2010-11-08 at 13:40 +0200, Michael S. Tsirkin wrote:
> > > > > On Wed, Oct 06, 2010 at 02:58:57PM -0600, Alex Williamson wrote:
> > > > > > Our code paths for saving or migrating a VM are full of functions 
> > > > > > that
> > > > > > return void, leaving no opportunity for a device to cancel a 
> > > > > > migration,
> > > > > > either from error or incompatibility.  The ivshmem driver attempted 
> > > > > > to
> > > > > > solve this with a no_migrate flag on the save state entry.  I think 
> > > > > > the
> > > > > > more generic and flexible way to solve this is to allow driver save
> > > > > > functions to fail.  This series implements that and converts ivshmem
> > > > > > to uses a set_params function to NAK migration much earlier in the
> > > > > > processes.  This touches a lot of files, but bulk of those changes 
> > > > > > are
> > > > > > simply s/void/int/ and tacking a "return 0" to the end of functions.
> > > > > > Thanks,
> > > > > > 
> > > > > > Alex
> > > > > 
> > > > > Well error handling is always tricky: it seems easier to
> > > > > require save handlers to never fail.
> > > > 
> > > > Sure it's easier, but does that make it robust?
> > > 
> > > More robust in the face of wwhat kind of failure?
> > 
> > I really don't understand why we're having a discussion about whether
> > providing a means to return an error is a good thing or not.  These
> > patches touch a lot of files, but the change is dead simple.
> 
> I just don't see the motivation. Presumably your patches are
> there to achieve some kind of goal, right? I am trying to
> figure out what that goal is.

My goal is that I want to be able to NAK a migration when devices are
assigned, and I think we can do it more generically than the no_migrate
flag so that it supports this application and any other reason that
saves might fail in the future.

> Currently savevm callbacks never fail. So they
> return void. Why is returing 0 and adding a bunch of code to test the
> condition that never happens a good idea?  It just seems to create more
> ways for devices to shoot themselves in the foot.

And more ways to indicate something bad happened and keep running.  We
already have far too many abort() calls in the code.

> > > > > So there's a bunch of code here but what exactly is the benefit?
> > > > > Since save handlers have no idea what does the remote do,
> > > > > what is the compatibility you mention?
> > > > 
> > > > There are two users I currently have in mind.  ivshmem currently makes
> > > > use of the register_device_unmigratable() because it makes use of host
> > > > specific resources and connections (aiui).  This sets the no_migrate
> > > > flag, which is not dynamic and a bit of a band-aide.
> > > >  The other is
> > > > device assignment, which needs a way to NAK a migration since physical
> > > > devices are never migratable.
> > > 
> > > Well since all these can't be migrated ever, a fixed property actually 
> > > seems
> > > a good match.  Sure it's not dynamic but all the easier to debug.
> > > 
> > > >  I imagine we could at some point have
> > > > devices with state tied to other features that can't always be detached
> > > > from the host, this tries to provide the infrastructure for that to
> > > > happen.
> > > > 
> > > > Alex
> > > 
> > > Let guest control whether you can migrate?
> > > Sounds like something that is more likely to be abused
> > > than used constructively. 
> > 
> > s/guest/device/  So you would rather the migration failed on the
> > incoming side where it may not be detected
> 
> And incoming migration handlers *must* validate the input, anyway.
> We should not plaster over this with checks on outgoing side.

I'm not in any way suggesting incoming shouldn't do validation.

> > or it may be detected too
> > late to stop the migration?
> > 
> > Alex
> 
> So there's a bug and device is in an unexpected state.
> What can we do? Assert, print an error, notify guest - all these
> come to mind. But stop migration? Seems arbitrary.

Perhaps the problem is that either an assert or an fprintf are the first
things that come to mind.  We shouldn't have guests randomly blowing up
or telling users to go scan through their log files to find errors.
It's not very hard to allow simple error handling, so why shouldn't our
first plan of attack be to return an error so that the human/qmp monitor
can detect it and inform the user.  For the current candidates for this
interface, there's no point notifying the guest, it's the interface
attempting to do the migration that needs to know there's something
blocking it.

Alex






reply via email to

[Prev in Thread] Current Thread [Next in Thread]