[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] Comparing New Image Formats: FVD vs. QED
From: |
Chunqiang Tang |
Subject: |
[Qemu-devel] Comparing New Image Formats: FVD vs. QED |
Date: |
Fri, 28 Jan 2011 17:15:01 -0500 |
Hi Anthony,
As you requested, I set up a wiki page for FVD at
http://wiki.qemu.org/Features/FVD . It includes a summary of FVD, a
detailed specification of FVD, and a comparison of the design and
performance of FVD and QED. I copied the comparison part below for easy
reference.
=================================================
Design Comparison
By design, FVD has the following advantages over QED:
1. Like other existing image formats, QED does storage allocation twice,
first by the image format and then by a host file system. This is a
fundamental problem that FVD was designed to address.
a) Most importantly, regardless of the underlying platform, QED
insists on getting in the way and doing storage allocation in its naive,
one-size-fit-all manner, which is unlikely to perform well in many cases,
because of the diversity of the platforms supported by QEMU. Storage
systems have different characteristics (solid-state drive/Flash, DAS, NAS,
SAN, etc), and host file systems (GFS, NTFS, FFS, LFS, ext2/ext3/ext4,
reiserFS, Reiser4, XFS, JFS, VMFS, ZFS, etc) provide many different
features and are optimized for different objectives (flash wear leveling,
seek distance, reliability, etc). An image format should piggyback on the
success of the diverse solutions developed by the storage and file systems
community through 40 years of hard work, rather than insisting on
reinventing a naive, one-size-fit-all wheel to redo storage allocation.
b) The interference between an image format and a host file system
may make neither of them work well, even if either of them is optimal by
itself. For example, what would happen when the image format and the host
file system both perform online defragmentation simultaneously? If the
image format's storage allocation algorithm really works well, the image
should be stored on a logical volume and the host file system should be
disabled. If the host file system's algorithm works well, the image
format's algorithm should be disabled. It is better not to use both at the
same time to confuse each other.
c) Obviously, doing storage allocation twice doubles the overhead:
updating on-disk metadata twice, causing fragmentation twice, and caching
metadata twice, etc.
By contrast, FVD's simple design makes all the following
configurations possible: 1) only perform storage allocation in a host file
system; 2) only perform storage allocation in FVD (directly on a logical
volume without a host file system); 3) do storage allocation twice as that
in existing image formats; or 4) FVD performs copy-on-write, copy-on-read,
and adaptive prefetching, but delegates the function of storage allocation
to any other QEMU image formats (assuming they are better in doing that).
This flexibility allows FVD to support any use cases, even if
unanticipated bizarre storage technology becomes main stream in the future
(flash, nano, or whatever).
2. QED relies on a defragmentation algorithm to solve the fragmentation
problem it introduces. Image-level defragmentation is an uncharted area
without prior research or open-source implementation---how does QED's
defragmentation interact with a host file system's defragmentation? How
continuous will a partially full image be? Under a dynamic workload, how
long will it take for defragmentation to settle down and what is the
quantified defragmentation overhead or benefit? Most importantly, why
first artificially introduce fragmentation at the image layer and then try
hard to defragment it? If users are empowered with choices, say, QEMU
providing a fun option called
"--fabricate-fragmentation-then-try-unproven-defrag", how many users would
want to enjoy this roller coaster ride and turn on the option? QED
mandates setting this option on, whereas FVD allows turning it off. When
FVD's table is disabled, FVD uses a RAW-image-like data layout.
3. A QED image relies on a host file system and cannot be stored on a
logical volume directly. Using logical volume is a valid use case and is
supported by libvirt. It is better to empower users with choices rather
than restricting what they can do. With a proper design, supporting
logical volume is actually quite simple, as shown by FVD. Results below
show that using a logical volume improves PostMark's file creation
throughput by 45-53%.
4. QED needs more memory to cache on-disk metadata than FVD does, and
introduces more disk I/Os to read on-disk metadata. For a 1TB image, QED's
metadata is 128MB, vs. FVD's 6MB metadata (2MB for the bitmap and 4MB for
the lookup table). If the lookup table is disabled, which is a preferred
high-performance configuration, then it is only 2MB for FVD.
5. QED introduces more disk I/O overhead to update on-disk metadata than
FVD does, due to several reasons. First, FVD's journal converts multiple
concurrent updates to sequential writes in the journal, which can be
merged into a single write by the host Linux kernel. Second, FVD's table
can be (preferably) disabled and hence it incurs no update overhead. Even
if the table is enabled, FVD's chunk is much larger than QED's cluster,
and hence needs less updates. Finally, although QED and FVD use the same
block/cluster size, FVD can be optimized to eliminate most bitmap updates
with several techniques: A) Use resize2fs to reduce the base image to its
minimum size (which is what a Cloud can do) so that most writes occur at
locations beyond the size of the base image, without the need to update
the bitmap; B) 'qemu-img create' can find zero-filled sectors in the base
image and preset the corresponding bits of bitmap (see need_zero_init in
the Features/FVD/Specification); and C) copy-on-read and prefetching do
not update the on-disk bitmap and once prefetching finishes, there is no
need for FVD to read or write the bitmap. See the paper for the details of
these bitmap optimizations. In summary, when an FVD image is fully
optimized (e.g., the table disabled and the base image is reduced to its
minimum size), FVD has almost zero overhead in metadata update and the
data layout is just like a RAW image.
6. FVD parallelizes I/Os to the maximum degree possible. For example, if
processing a VM-generated read request needs to read data from the base
image as well as several non-continuous chunks in the FVD image, FVD
issues all I/O requests in parallel rather than sequentially.
Performance Comparison
See the figure at http://wiki.qemu.org/Features/FVD/Compare . This figure
shows that the file creation throughput of NetApp's PostMark benchmark
under FVD is 74.9% to 215% higher than that under QED.
=================================================
Regards,
ChunQiang (CQ) Tang
Homepage: http://www.research.ibm.com/people/c/ctang
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Qemu-devel] Comparing New Image Formats: FVD vs. QED,
Chunqiang Tang <=