qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] More robust migration


From: Anthony Liguori
Subject: Re: [Qemu-devel] [RFC] More robust migration
Date: Fri, 20 Feb 2009 12:27:30 -0600
User-agent: Thunderbird 2.0.0.19 (X11/20090105)

Jamie Lokier wrote:
Anthony Liguori wrote:
2. Introduce a length field to the header of each device.
IMHO, this would reduce robustness. It's also difficult because of the way savevm registration works. You don't know how large a section is until it's written and migration streams are not seekable.

The way HTTP deals with not knowing the size in advance is is to split
data into chunks, each chunk the size of a small write buffer, and a
chunk size is written in front of each one.  This allows storing
sections of binary data whose size isn't known in advance, but still
safely skip them.

This would allow to skip unknown (or unwanted) devices.
No good can come from this. If you have an unknown section, you must throw and error and stop the migration. What if this is for a device that the guest is interacting with? The device just disappears after migration? All savevm state is state that affects the functionality of a guest. Throwing away this state will change the functionality of the VM and migration should not affect guest functionality.

What if you're migrating from a snapshot made on a host with some
pass-through USB device to another host which cannot provide the same
device.  In that case I'd like the option for the guest to see the
device has disappeared.  Maybe it's stopped working (HPET), or maybe
it's unplugged (anything hot unpluggable).

Stop working is IMHO unacceptable. Devices that support hot plugging, you can hot unplug and *then* perform the migration.

In general, hot unplugging requires guest cooperation FWIW. Bad things will often happen if you just yank a USB cable out of your computer.

That's preferable to not being able to use the snapshot at all,
effectively having to trash it.

I disagree. Something that is broken in an unknown way is not better than having something gracefully fail. If you do hardware pass through, forget about snapshotting/migration/etc.

What are the use cases where you think this would be beneficial? I really see the change in semantics from the old way (throwing away unknown sections) to the new way (requiring strict versioning and validating all sections) as being a huge step toward robustness.

I've been upset at a "savevm" which I wrote with some past version of
QEMU that I couldn't load in a later version.  It wasn't obvious why,
just that it refused. And I didn't have the old version, or even know
which the old version was.  And even if I could have reconstructed the
old QEMU - I wanted to migrate to a newer version.  It's no fun having
to reconstruct a carefully primed guest snapshot test state from its
reboot, if that can be avoided.

Device configuration files will go a long way to upgrading. Sometimes you have to blacklist older versions of devices because there were bugs in the save/restore functions. In that case, there's really nothing we can do. Your snapshot was invalid.

My primary goal for migration is robustness. I do not think it's a good idea to support any circumstances that could introduce changes in guest visible state during a live migration.

What about safe hotpluggable devices?

Make your changes in the guest to allow safe unplug, then unplug, then migrate.

Regards,

Anthony Liguori




reply via email to

[Prev in Thread] Current Thread [Next in Thread]