Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

From:	Avi Kivity
Subject:	Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format
Date:	Thu, 09 Sep 2010 09:53:05 +0300
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.8) Gecko/20100806 Fedora/3.1.2-1.fc13 Thunderbird/3.1.2

 On 09/08/2010 02:15 PM, Stefan Hajnoczi wrote:

3. Metadata update reaches disk but data does not.  The interesting
case!  The L2 table now points to a cluster which is beyond the last
cluster in the image file.  Remember that file size is rounded down by
cluster size, so partial data writes are discarded and this case
applies.

Now we're in trouble.  The image cannot be accessed without some
sanity checking because not only do table entries point to invalid
clusters, but new allocating writes might make previously invalid
cluster offsets valid again (then there would be two or more table
entries pointing to the same cluster)!

Anthony's suggestion is to use a "mounted" or "dirty" bit in the qed
header to detect a crashed image when opening the image file.  If no
crash has occurred, then the mounted bit is unset and normal operation
is safe.  If the mounted bit is set, then an check of the L1/L2 tables
must be performed and any invalid cluster offsets must be cleared to
zero.  When an invalid cluster is cleared to zero, we arrive back at
case 1 above: neither data write nor metadata update reached the disk,
and we are in a safe state.

While fsck has a lovely ext2 retro feel, there's a reason it's shunned -it can take quite a while to run. A fully loaded L1 with 32K entrieswill require 32K random I/Os, which can take over 5 minutes on a diskthat provides 100 IOPS. On a large shared disk, you'll have a lot moreIOPS, but likely much fewer IOPS per guest, so if you have a power loss,fsck time per guest will likely be longer (irrespective of guest size).

Preallocation, on the other hand, is amortized, or you can piggy-backits fsync on a guest flush. Note its equally applicable to qcow2 and qed.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format, (continued)

Prev by Date: Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format
Next by Date: Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format
Previous by thread: Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format
Next by thread: Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format
Index(es):
- Date
- Thread