On Mon, Sep 13, 2010 at 3:13 PM, Kevin Wolf<address@hidden> wrote:
Am 13.09.2010 15:42, schrieb Anthony Liguori:
On 09/13/2010 08:39 AM, Kevin Wolf wrote:
Yeah, one of the key design points of live migration is to minimize the
number of failure scenarios where you lose a VM. If someone typed the
wrong command line or shared storage hasn't been mounted yet and we
delay failure until live migration is in the critical path, that would
be terribly unfortunate.
We would catch most of them if we try to open the image when migration
starts and immediately close it again until migration is (almost)
completed, so that no other code can possibly use it before the source
has really closed it.
I think the only real advantage is that we fix NFS migration, right?
That's the one that we know about, yes.
The rest is not a specific scenario, but a strong feeling that having an
image opened twice at the same time feels dangerous. As soon as an
open/close sequence writes to the image for some format, we probably
have a bug. For example, what about this mounted flag that you were
discussing for QED?
There is some room left to work in, even if we can't check in open().
One idea would be to do the check asynchronously once I/O begins. It
is actually easy to check L1/L2 tables as they are loaded.
The only barrier relationship between I/O and checking is that an
allocating write (which will need to update L1/L2 tables) is only
allowed after check completes. Otherwise reads and non-allocating
writes may proceed while the image is not yet fully checked. We can
detect when a table element is an invalid offset and discard it.