[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] Re: [PATCH 0/6] Save state error handling (kill off no_migr
From: |
Alex Williamson |
Subject: |
[Qemu-devel] Re: [PATCH 0/6] Save state error handling (kill off no_migrate) |
Date: |
Tue, 09 Nov 2010 12:35:48 -0700 |
On Tue, 2010-11-09 at 10:44 -0700, Alex Williamson wrote:
> On Tue, 2010-11-09 at 18:49 +0200, Michael S. Tsirkin wrote:
> > On Tue, Nov 09, 2010 at 09:30:45AM -0700, Alex Williamson wrote:
> > > On Tue, 2010-11-09 at 18:15 +0200, Michael S. Tsirkin wrote:
> > > > On Tue, Nov 09, 2010 at 08:47:00AM -0700, Alex Williamson wrote:
> > > > > > > But it could. What if ivshmem is acting in a peer role, but has
> > > > > > > no
> > > > > > > clients, could it migrate? What if ivshmem is migratable when the
> > > > > > > migration begins, but while the migration continues, a connection
> > > > > > > is
> > > > > > > setup and it becomes unmigratable.
> > > > > >
> > > > > > Sounds like something we should work to prevent, not support :)
> > > > >
> > > > > s/:)/:(/ why?
> > > >
> > > > It will just confuse everyone. Also if it happens after sending
> > > > all of memory, it's pretty painful.
> > >
> > > It happens after sending all of memory with no_migrate, and I think
> > > pushing that earlier might introduce some races around when
> > > register_device_unmigratable() can be called.
> >
> > Good point. I guess we could check it twice just to speed things up.
> >
> > > > > > > Using this series, ivshmem would
> > > > > > > have multiple options how to support this. It could a) NAK the
> > > > > > > migration, b) drop connections and prevent new connections until
> > > > > > > the
> > > > > > > migration finishes, c) detect that new connections have happened
> > > > > > > since
> > > > > > > the migration started and cancel. And probably more. no_migrate
> > > > > > > can
> > > > > > > only do a). And in fact, we can only test no_migrate after the
> > > > > > > VM is
> > > > > > > stopped (after all memory is migrated) because otherwise it could
> > > > > > > race
> > > > > > > with devices setting no_migrate during migration.
> > > > > >
> > > > > > We really want no_migrate to be static. changing it is abusing
> > > > > > the infrastructure.
> > > > >
> > > > > You call it abusing, I call it making use of the infrastructure. Why
> > > > > unnecessarily restrict ourselves? Is return 0/-1 really that scary,
> > > > > unmaintainable, undebuggable? I don't understand the resistance.
> > > > >
> > > > > Alex
> > > >
> > > > management really does not know how to handle unexpected
> > > > migration failures. They must be avoided.
> > > >
> > > > There are some very special cases that fail migration. They are
> > > > currently easy to find with grep register_device_unmigratable.
> > > > I prefer to keep it that way.
> > >
> > > How can management tools be improved to better handle unexpected
> > > migration failures when the only way for qemu to fail is an abort?
> > > We need the infrastructure to at least return an error first. Do we just
> > > need to add some fprintfs to the save core to print the id string of the
> > > device that failed to save? I just can't buy the "code is easier to
> > > grep" as an argument against adding better error handling to the save
> > > code path.
> >
> > I just don't buy the 'we'll return meaningless error codes at random
> > point in time and management will figure it out' as an argument :)
>
> Why is the error code meaningless? The error code stops the migration
> in qemu and hopefully prints an error message (we could easily add an
> fprintf to the core save code to ensure we know the device responsible
> for the NAK). From there we can figure out how to return the error to
> monitors and management tools, but we need to have a way to know there's
> an error first.
>
> > > Anyone else want to chime in?
> > >
> > > Alex
> >
> > Maybe try coding up some user using the new infrastructure to do
> > something useful, that register_device_unmigratable can't do.
>
> With the number of people I hear complaining about how qemu has too many
> aborts, no error checking, and no way to return errors, I'm a little
> dumbfounded that there's such a roadblock to actually add some simple
> error handling. Is it the error handling you're opposed to, or the way
> I'm using it to NAK a migration?
Would something like this in place of 6/6 ease your grep'ability and
debug'ability concerns? Unfortunately it adds an assert, but if we
stomp on the vmsd, then the save state entry can't be unregistered.
Signed-off-by: Alex Williamson <address@hidden>
---
diff --git a/savevm.c b/savevm.c
index 521edc8..b0aaa46 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1039,7 +1039,6 @@ typedef struct SaveStateEntry {
const VMStateDescription *vmsd;
void *opaque;
CompatEntry *compat;
- int no_migrate;
} SaveStateEntry;
@@ -1103,7 +1102,6 @@ int register_savevm_live(DeviceState *dev,
se->load_state = load_state;
se->opaque = opaque;
se->vmsd = NULL;
- se->no_migrate = 0;
if (dev && dev->parent_bus && dev->parent_bus->info->get_dev_path) {
char *id = dev->parent_bus->info->get_dev_path(dev);
@@ -1170,6 +1168,22 @@ void unregister_savevm(DeviceState *dev, const char
*idstr, void *opaque)
}
}
+static int nomigrate_set_params(int blk_enable, int shared, void *opaque)
+{
+ return -EFAULT;
+}
+
+static int nomigrate_save_live_state(Monitor *mon, QEMUFile *f, int stage,
+ void *opaque)
+{
+ return -EFAULT;
+}
+
+static int nomigrate_save_state(QEMUFile *f, void *opaque)
+{
+ return -EFAULT;
+}
+
/* mark a device as not to be migrated, that is the device should be
unplugged before migration */
void register_device_unmigratable(DeviceState *dev, const char *idstr,
@@ -1190,7 +1204,10 @@ void register_device_unmigratable(DeviceState *dev,
const char *idstr,
QTAILQ_FOREACH(se, &savevm_handlers, entry) {
if (strcmp(se->idstr, id) == 0 && se->opaque == opaque) {
- se->no_migrate = 1;
+ se->set_params = nomigrate_set_params;
+ se->save_live_state = nomigrate_save_live_state;
+ se->save_state = nomigrate_save_state;
+ assert(!se->vmsd);
}
}
}
@@ -1410,10 +1427,6 @@ static int vmstate_load(QEMUFile *f, SaveStateEntry *se,
int version_id)
static int vmstate_save(QEMUFile *f, SaveStateEntry *se)
{
- if (se->no_migrate) {
- return -1;
- }
-
if (!se->vmsd) { /* Old style */
return se->save_state(f, se->opaque);
}
@@ -1443,6 +1456,9 @@ int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f,
int blk_enable,
}
ret = se->set_params(blk_enable, shared, se->opaque);
if (ret < 0) {
+ monitor_printf(mon,
+ "Save state begin blocked by device '%s', error:"
+ " %s\n", se->idstr, strerror(-ret));
return ret;
}
}
@@ -1470,6 +1486,9 @@ int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f,
int blk_enable,
ret = se->save_live_state(mon, f, QEMU_VM_SECTION_START, se->opaque);
if (ret < 0) {
+ monitor_printf(mon,
+ "Save state begin blocked by device '%s', error:"
+ " %s\n", se->idstr, strerror(-ret));
return ret;
}
}
@@ -1503,6 +1522,9 @@ int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f)
synchronized over and over again. */
break;
} else if (ret < 0) {
+ monitor_printf(mon,
+ "Save state iterate blocked by device '%s', error:"
+ " %s\n", se->idstr, strerror(-ret));
return ret;
}
}
@@ -1535,6 +1557,9 @@ int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f)
r = se->save_live_state(mon, f, QEMU_VM_SECTION_END, se->opaque);
if (r < 0) {
+ monitor_printf(mon,
+ "Save state complete blocked by device '%s', error:"
+ " %s\n", se->idstr, strerror(-r));
return r;
}
}
@@ -1559,7 +1584,9 @@ int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f)
r = vmstate_save(f, se);
if (r < 0) {
- monitor_printf(mon, "cannot migrate with device '%s'\n",
se->idstr);
+ monitor_printf(mon,
+ "Save state complete blocked by device '%s', error:"
+ " %s\n", se->idstr, strerror(-r));
return r;
}
}
- [Qemu-devel] Re: [PATCH 0/6] Save state error handling (kill off no_migrate), (continued)
- [Qemu-devel] Re: [PATCH 0/6] Save state error handling (kill off no_migrate), Michael S. Tsirkin, 2010/11/09
- [Qemu-devel] Re: [PATCH 0/6] Save state error handling (kill off no_migrate), Alex Williamson, 2010/11/09
- [Qemu-devel] Re: [PATCH 0/6] Save state error handling (kill off no_migrate), Michael S. Tsirkin, 2010/11/09
- [Qemu-devel] Re: [PATCH 0/6] Save state error handling (kill off no_migrate), Alex Williamson, 2010/11/09
- [Qemu-devel] Re: [PATCH 0/6] Save state error handling (kill off no_migrate), Michael S. Tsirkin, 2010/11/09
- [Qemu-devel] Re: [PATCH 0/6] Save state error handling (kill off no_migrate), Alex Williamson, 2010/11/09
- [Qemu-devel] Re: [PATCH 0/6] Save state error handling (kill off no_migrate), Michael S. Tsirkin, 2010/11/09
- [Qemu-devel] Re: [PATCH 0/6] Save state error handling (kill off no_migrate), Alex Williamson, 2010/11/09
- [Qemu-devel] Re: [PATCH 0/6] Save state error handling (kill off no_migrate), Michael S. Tsirkin, 2010/11/09
- [Qemu-devel] Re: [PATCH 0/6] Save state error handling (kill off no_migrate), Alex Williamson, 2010/11/09
- [Qemu-devel] Re: [PATCH 0/6] Save state error handling (kill off no_migrate),
Alex Williamson <=
[Qemu-devel] Re: [PATCH 0/6] Save state error handling (kill off no_migrate), Juan Quintela, 2010/11/16