qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format


From: Avi Kivity
Subject: Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format
Date: Fri, 10 Sep 2010 17:02:48 +0300
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.8) Gecko/20100806 Fedora/3.1.2-1.fc13 Thunderbird/3.1.2

 On 09/10/2010 04:22 PM, Anthony Liguori wrote:
Looks like it depends on fsck, which is not a good idea for large images.


fsck will always be fast on qed because the metadata is small. For a 1PB image, there's 128MB worth of L2s if it's fully allocated

It's 32,000 seeks.

(keeping in mind, that once you're fully allocated, you'll never fsck again).

Why? Fully populated L1 (so all L2s are allocated) doesn't mean a fully allocated image. You're still allocating and linking into L2s.

If you've got 1PB worth of storage, I'm fairly sure you're going to be able to do 128MB of reads in a short period of time. Even if it's a few seconds, it only occurs on power failure so it's pretty reasonable.

Consider a cloud recovering from power loss, even if you're fscking thousands of 100GB images you'll create a horrible seek storm on your storage (to be followed by a seek storm from all the guests booting).

No, fsck is not a good idea.


I need to look at the actual ATA and SCSI specs for how this will
work.  The issue I am concerned with is sub-cluster trim operations.
If the trim region is less than a cluster, then both qed and qcow2
don't really have a way to handle it.  Perhaps we could punch a hole
in the file, given a userspace interface to do this, but that isn't
ideal because we're losing sparseness again.

To deal with a sub-cluster TRIM, look at the surrounding sectors. If they're zero, free the cluster. If not, write zeros or use sys_punch() to the range specified by TRIM.

Better yet, if you can't trim a full cluster, just write out zeros and have a separate background process that punches out zero clusters.


That can work as well, or a combination perhaps.

That approach is a bit more generic and will help compact images independently of guest trims.

You still need a freelist.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]