qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [Patch 0/4] [RFC] Zero Cluster Dedup, Offline dedup, qemu-


From: Shahar Frank
Subject: [Qemu-devel] [Patch 0/4] [RFC] Zero Cluster Dedup, Offline dedup, qemu-img extentions
Date: Mon, 6 Oct 2008 10:23:47 -0700

Hi All,

This is a rewrite of the previous "zero dedup" patch I sent.

As it is described in the first patch the problem this patch attempt to solve 
is the "inflation" of the COW image. In a system that uses templates and COW 
images above (linked clones in VMWare terms) there is a problem that over time 
the COW layer deviates from the base image. Some of it is justified (data), but 
some of it is noise (undeleted files in NTFS, temporary files, swap, etc.).

The following patch introduces two mechanisms to handle the above problem:
1. A zero dedup - an extention to the qcow2 format that identifies zero cluster 
writes and dedup it to a shared zero cluster (similar to the shared zero 
cluster in the Linux kernel). Alternatively, when possible, the cluster is just 
de-allocated (was Laurent Vivier suggestion). In either case, the original 
cluster is de-allocated and its space can be reused for new clusters. To employ 
this feature, some kind of cleaning utility is expected to run in the OS 
context or offline. This utility should wipe unused, and/or temporary space by 
writing zeros on it. This will lead to space deallocation via the zero dedup 
mechanism.
2. A general dedup infrastructure to be used to compress the image. The current 
version supports only intra image deduping but future versions will be extended 
to support across images deduping. This can mitigate the effect of most OS 
updates, patches, virus updates, application installations and updates, etc. 
For example, an application is installed within an image that is part of a 
deduped repository, is going to use more space only if no other image contains 
this application. In mass deployment the benefit is huge. From my experience 
even the current intra-image version can save about %30 of the space.

In addition, a set of new verbs are implemented to extend qemu-img:
1. Info -r flag is added to show the image tree
2. Map [-r] [-s] shows the logical to physical mapping (optionally for the 
entire image tree, and/or with md5sum of the data). This can be used to check, 
verify and collect statistics about the internal image layout and its 
attributes (how fragmented it is, how long the cluster sequences are, etc.). It 
can also used to validate the data integrity for example, before and after 
image manipulations, etc.
3. Check performs an internal image check. Right now it is implemented only for 
qcow2. From my experience, this check can validate image integrity or identify 
image corruptions long before they corruption are user visible.
4. Dedup perform (simple, naïve) dedup of cluster X to cluster Y. Even in this 
early stage it can be used to compress the image and save space.

The changes from the first version are:
1. The zero dedup mechanism is improved to handle sequences of clusters
2. If no backing file is used, a "hole" is punched instead of deduping
3. A general dedup verb is added to the qemu-img.
4. An md5sum is added to the map verb.

The new dedup mechanism is a new fascinating way to mess up your images ;-) but 
is can also used to reduce the image storage size - I tested it on a pretty 
clean XP image and I got %30 dedup rate. Note that a dedup between images (not 
implemented yet) is expected to be much more efficient (%60 and higher).

Many of the issues covered here were raised by Kevin Wolf and Laurent Vivier, 
so I want to thanks them for their attention and comments.

Patch 1: Basic zero dedup optimization
Patch 2: New image checking verbs and extensionsh
Patch 3: Md5sum extension to map
Patch 4: offline dedup verb

The patches are not ready for integration, but they are very close to that. I 
send them for review and comments.

Still missing:
1. A flag to qemu main to control the zero dedup.
2. Cleanup (debugging, other).
3. A proper (efficient) dedup utility.
4. A method to compact/defrag the image after the dedup.
5. ?

The patches are tested as follows:

I created an image and export it using nbd:
./qemu-img create -b /tmp/data -f qcow2 /tmp/test.qcow2  && ./qemu-nbd -p 999 
-vv /tmp/test.qcow2 # with backing file
# OR
./qemu-img create -f qcow2 /tmp/test.qcow2  && ./qemu-nbd -p 999 -vv 
/tmp/test.qcow2 # without a backing file

Then I mount it and run the check script

nbd-client localhost 999 /dev/nbd0

./checkzopt.sh /tmp/test.qcow2 /dev/nbd0

# data integrity check
./qemu-img map -s /tmp/test.qcow2

# image integrity check
./qemu-img check /tmp/test.qcow2

The general dedup verb is tested as follows:

# copy the original image 
cp winXp.qcow2 winXp-d.qcow2

# create a sorted cluster hashes list. The list is also secondary sorted using 
the physical offset to dedup all duplications into the lowest offset cluster.
./qemu-img map -s winXp-d.qcow2 | sort -k 4 -k 3 > winXp-d.sorted

# Create a script of dedup commands:
awk '{ if ($4 == lasthash) { print "./qemu-img dedup", $1,  $2, lastoffs; next 
} else { lasthash = $4; lastoffs = $2; }}' winXp-d.sorted > winXp-d.dedup

# check how many cluster we are going to dedup
wc -l winXp-d.dedup

# perform the dedup - takes a lot of time...
sh winXp-d.dedup

# check the resulting image consistency
./qemu-img check winXp-d.qcow2

# generate md5sums of the data clusters and compare it to the original image
./qemu-img map -s winXp-d.qcow2 > /tmp/1
./qemu-img map -s winXp.qcow2 > /tmp/2
cut -f 2,4 -d ' ' /tmp/1 > /tmp/1.cut
cut -f 2,4 -d ' ' /tmp/2 > /tmp/2.cut
diff /tmp/[12].cut

# check that it can run...
kvm winXp-d.qcow2

Signed-off-by: Shahar Frank <address@hidden>

Attachment: checkzopt.sh
Description: checkzopt.sh


reply via email to

[Prev in Thread] Current Thread [Next in Thread]