[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] [Patch 0/4] [RFC] Zero Cluster Dedup, Offline dedup, qemu-
From: |
Shahar Frank |
Subject: |
[Qemu-devel] [Patch 0/4] [RFC] Zero Cluster Dedup, Offline dedup, qemu-img extentions |
Date: |
Mon, 6 Oct 2008 10:23:47 -0700 |
Hi All,
This is a rewrite of the previous "zero dedup" patch I sent.
As it is described in the first patch the problem this patch attempt to solve
is the "inflation" of the COW image. In a system that uses templates and COW
images above (linked clones in VMWare terms) there is a problem that over time
the COW layer deviates from the base image. Some of it is justified (data), but
some of it is noise (undeleted files in NTFS, temporary files, swap, etc.).
The following patch introduces two mechanisms to handle the above problem:
1. A zero dedup - an extention to the qcow2 format that identifies zero cluster
writes and dedup it to a shared zero cluster (similar to the shared zero
cluster in the Linux kernel). Alternatively, when possible, the cluster is just
de-allocated (was Laurent Vivier suggestion). In either case, the original
cluster is de-allocated and its space can be reused for new clusters. To employ
this feature, some kind of cleaning utility is expected to run in the OS
context or offline. This utility should wipe unused, and/or temporary space by
writing zeros on it. This will lead to space deallocation via the zero dedup
mechanism.
2. A general dedup infrastructure to be used to compress the image. The current
version supports only intra image deduping but future versions will be extended
to support across images deduping. This can mitigate the effect of most OS
updates, patches, virus updates, application installations and updates, etc.
For example, an application is installed within an image that is part of a
deduped repository, is going to use more space only if no other image contains
this application. In mass deployment the benefit is huge. From my experience
even the current intra-image version can save about %30 of the space.
In addition, a set of new verbs are implemented to extend qemu-img:
1. Info -r flag is added to show the image tree
2. Map [-r] [-s] shows the logical to physical mapping (optionally for the
entire image tree, and/or with md5sum of the data). This can be used to check,
verify and collect statistics about the internal image layout and its
attributes (how fragmented it is, how long the cluster sequences are, etc.). It
can also used to validate the data integrity for example, before and after
image manipulations, etc.
3. Check performs an internal image check. Right now it is implemented only for
qcow2. From my experience, this check can validate image integrity or identify
image corruptions long before they corruption are user visible.
4. Dedup perform (simple, naïve) dedup of cluster X to cluster Y. Even in this
early stage it can be used to compress the image and save space.
The changes from the first version are:
1. The zero dedup mechanism is improved to handle sequences of clusters
2. If no backing file is used, a "hole" is punched instead of deduping
3. A general dedup verb is added to the qemu-img.
4. An md5sum is added to the map verb.
The new dedup mechanism is a new fascinating way to mess up your images ;-) but
is can also used to reduce the image storage size - I tested it on a pretty
clean XP image and I got %30 dedup rate. Note that a dedup between images (not
implemented yet) is expected to be much more efficient (%60 and higher).
Many of the issues covered here were raised by Kevin Wolf and Laurent Vivier,
so I want to thanks them for their attention and comments.
Patch 1: Basic zero dedup optimization
Patch 2: New image checking verbs and extensionsh
Patch 3: Md5sum extension to map
Patch 4: offline dedup verb
The patches are not ready for integration, but they are very close to that. I
send them for review and comments.
Still missing:
1. A flag to qemu main to control the zero dedup.
2. Cleanup (debugging, other).
3. A proper (efficient) dedup utility.
4. A method to compact/defrag the image after the dedup.
5. ?
The patches are tested as follows:
I created an image and export it using nbd:
./qemu-img create -b /tmp/data -f qcow2 /tmp/test.qcow2 && ./qemu-nbd -p 999
-vv /tmp/test.qcow2 # with backing file
# OR
./qemu-img create -f qcow2 /tmp/test.qcow2 && ./qemu-nbd -p 999 -vv
/tmp/test.qcow2 # without a backing file
Then I mount it and run the check script
nbd-client localhost 999 /dev/nbd0
./checkzopt.sh /tmp/test.qcow2 /dev/nbd0
# data integrity check
./qemu-img map -s /tmp/test.qcow2
# image integrity check
./qemu-img check /tmp/test.qcow2
The general dedup verb is tested as follows:
# copy the original image
cp winXp.qcow2 winXp-d.qcow2
# create a sorted cluster hashes list. The list is also secondary sorted using
the physical offset to dedup all duplications into the lowest offset cluster.
./qemu-img map -s winXp-d.qcow2 | sort -k 4 -k 3 > winXp-d.sorted
# Create a script of dedup commands:
awk '{ if ($4 == lasthash) { print "./qemu-img dedup", $1, $2, lastoffs; next
} else { lasthash = $4; lastoffs = $2; }}' winXp-d.sorted > winXp-d.dedup
# check how many cluster we are going to dedup
wc -l winXp-d.dedup
# perform the dedup - takes a lot of time...
sh winXp-d.dedup
# check the resulting image consistency
./qemu-img check winXp-d.qcow2
# generate md5sums of the data clusters and compare it to the original image
./qemu-img map -s winXp-d.qcow2 > /tmp/1
./qemu-img map -s winXp.qcow2 > /tmp/2
cut -f 2,4 -d ' ' /tmp/1 > /tmp/1.cut
cut -f 2,4 -d ' ' /tmp/2 > /tmp/2.cut
diff /tmp/[12].cut
# check that it can run...
kvm winXp-d.qcow2
Signed-off-by: Shahar Frank <address@hidden>
checkzopt.sh
Description: checkzopt.sh
[Qemu-devel] [Patch 0/4] [RFC] Zero Cluster Dedup, Offline dedup, qemu-img extentions,
Shahar Frank <=