qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format


From: Anthony Liguori
Subject: Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format
Date: Fri, 10 Sep 2010 10:02:35 -0500
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.11) Gecko/20100713 Lightning/1.0b1 Thunderbird/3.0.6

On 09/10/2010 08:48 AM, Christoph Hellwig wrote:
On Fri, Sep 10, 2010 at 08:22:14AM -0500, Anthony Liguori wrote:
fsck will always be fast on qed because the metadata is small.  For a
1PB image, there's 128MB worth of L2s if it's fully allocated (keeping
in mind, that once you're fully allocated, you'll never fsck again).  If
you've got 1PB worth of storage, I'm fairly sure you're going to be able
to do 128MB of reads in a short period of time.  Even if it's a few
seconds, it only occurs on power failure so it's pretty reasonable.
I don't think it is.  Even if the metadata is small it can still be
spread all over the disks and seek latencies might kill you.  I think
if we want to make qed future proof it needs to provide transactional
integrity for metadata updates, just like a journaling filesystem.

I think the biggest challenge with an image format is finding the balance between host FS features and image format features and deciding where to solve problems.

Down the road, fsync() might not actually suck on file systems and recovery in the face of failure might be trivial because we can just fsync() after every metadata write. So going to great lengths to deal with meta data transactions may be a lot of work for little gain.

What makes us future proof is having a good feature support. qcow2 doesn't have this. We have a good way at making purely informational changes and also making changes that break the format. Those features are independent so they can be backported in a compatible way too.

Regards,

Anthony Liguori

Given that small amount of metadata and less different kinds it will
still be a lot simpler than a full filesystem of course.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]