On Fri, Sep 10, 2010 at 10:22 PM, Jamie Lokier<address@hidden> wrote:
Stefan Hajnoczi wrote:
Since there is no ordering imposed between the data write and metadata
update, the following scenarios may occur on crash:
1. Neither data write nor metadata update reach the disk. This is
fine, qed metadata has not been corrupted.
2. Data reaches disk but metadata update does not. We have leaked a
cluster but not corrupted metadata. Leaked clusters can be detected
with qemu-img check.
3. Metadata update reaches disk but data does not. The interesting
case! The L2 table now points to a cluster which is beyond the last
cluster in the image file. Remember that file size is rounded down by
cluster size, so partial data writes are discarded and this case
applies.
Better add:
4. File size is extended fully, but the data didn't all reach the disk.
This case is okay.
If a data cluster does not reach the disk but the file size is
increased there are two outcomes:
1. A leaked cluster if the L2 table update did not reach the disk.
2. A cluster with junk data, which is fine since the guest has no
promise the data safely landed on disk without a completing a flush.
A flush is performed after allocating new L2 tables and before linking
them into the L1 table. Therefore clusters can be leaked but an
invalid L2 table can never be linked into the L1 table.
5. Metadata is partially updated.
6. (Nasty) Metadata partial write has clobbered neighbouring
metadata which wasn't meant to be changed. (This may happen up
to a sector size on normal hard disks - data is hard to come by.
This happens to a much larger file range on flash and RAIDs
sometimes - I call it the "radius of destruction").
6 can also happen when doing the L1 updated mentioned earlier, in
which case you might lose a much larger part of the guest image.
These two cases are problematic.