qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format


From: Avi Kivity
Subject: Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format
Date: Thu, 09 Sep 2010 09:53:05 +0300
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.8) Gecko/20100806 Fedora/3.1.2-1.fc13 Thunderbird/3.1.2

 On 09/08/2010 02:15 PM, Stefan Hajnoczi wrote:
3. Metadata update reaches disk but data does not.  The interesting
case!  The L2 table now points to a cluster which is beyond the last
cluster in the image file.  Remember that file size is rounded down by
cluster size, so partial data writes are discarded and this case
applies.

Now we're in trouble.  The image cannot be accessed without some
sanity checking because not only do table entries point to invalid
clusters, but new allocating writes might make previously invalid
cluster offsets valid again (then there would be two or more table
entries pointing to the same cluster)!

Anthony's suggestion is to use a "mounted" or "dirty" bit in the qed
header to detect a crashed image when opening the image file.  If no
crash has occurred, then the mounted bit is unset and normal operation
is safe.  If the mounted bit is set, then an check of the L1/L2 tables
must be performed and any invalid cluster offsets must be cleared to
zero.  When an invalid cluster is cleared to zero, we arrive back at
case 1 above: neither data write nor metadata update reached the disk,
and we are in a safe state.

While fsck has a lovely ext2 retro feel, there's a reason it's shunned - it can take quite a while to run. A fully loaded L1 with 32K entries will require 32K random I/Os, which can take over 5 minutes on a disk that provides 100 IOPS. On a large shared disk, you'll have a lot more IOPS, but likely much fewer IOPS per guest, so if you have a power loss, fsck time per guest will likely be longer (irrespective of guest size).

Preallocation, on the other hand, is amortized, or you can piggy-back its fsync on a guest flush. Note its equally applicable to qcow2 and qed.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]