qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] Add a disk format named iROW, supporting high-e


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [PATCH] Add a disk format named iROW, supporting high-efficiency VM snapshot
Date: Mon, 28 Jan 2013 11:06:48 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

On Sat, Jan 26, 2013 at 04:15:37PM +0800, address@hidden wrote:
> diff --git a/block/irow.h b/block/irow.h
> new file mode 100644
> index 0000000..131b741
> --- /dev/null
> +++ b/block/irow.h
> @@ -0,0 +1,135 @@
> +/* IROW(Improved ROW)Disk Format
> + * */
> +/*
> + * iRow (imporved Redirect-on-Write) is a disk format supporting 
> high-efficiency VM disk snapshot.
> + * iROW uses bitmap to reduce the amount of metadata, so that both the VM 
> disk snapshot key operations
> + * performance and the VM disk I/O performance would be enhanced at the same 
> time.
> + *
> + *The iROW VM disk image consists of a meta file and several snapshots.
> + *
> + *A snapshot consists of 2 files: a bitmap file (btmp file) and a VM disk 
> data file (irvd file).
> + *The current state of the iROW VM disk also occupies a snapshot.
> + *
> + *The meta file consists of the meta header and the snapshots information. 
> The meta header is used to
> + *store basic information of VM disk image. The snapshots information 
> sequentially stores every snapshot’s name,
> + *id and others related information.
> + *
> + *The btmp file consists of a bitmap and the VM state data. The bitmap is 
> used to indicate whether the
> + *clusters exist in corresponding irvd file. Each cluster in the VM disk 
> image is mapped to a bit in the bitmap.
> + *
> + *The irvd file is used to store the actual data of the VM disk image. The 
> smallest unit of storage is cluster.
> + *iROW does not decide the address of the data clusters. It just writes the 
> clusters to the same VM disk image
> + *addresses as the virtual addresses of the clusters. Because of host 
> machine’s file system support sparse files,
> + *iROW also achieves the gradual growth of the VM disk image size with the 
> actual disk usage.
> + *
> + */
> +#define IROW_MAGIC (('I' << 24) | ('R' << 16) | ('O' << 8) | 'W')
> +#define IROW_VERSION 1
> +
> +#define IROW_SNAPHEADER_MAGIC (('S' << 24) | ('N' << 16) | ('A' << 8) | 'P')
> +
> +#define MIN_CLUSTER_BITS 9
> +#define MAX_CLUSTER_BITS 21
> +#define MAX_FILE_NAME_LENGTH 256
> +
> +#define IROW_READ 1
> +#define IROW_WRITE 2
> +#define IROW_AIO_READ 3
> +#define IROW_AIO_WRITE 4
> +
> +
> +typedef struct __attribute__((packed)) IRowMeta {
> +    uint32_t magic;
> +    uint32_t version;
> +    uint32_t copy_on_demand;
> +    uint32_t nb_snapshots;
> +    uint32_t cluster_size;
> +    uint32_t cluster_bits;
> +    uint32_t sectors_per_cluster;
> +    uint64_t total_clusters;
> +    uint64_t disk_size;
> +    char current_btmp[MAX_FILE_NAME_LENGTH];
> +    char backing_file[MAX_FILE_NAME_LENGTH];
> +} IRowMeta;
> +
> +typedef struct __attribute__((packed)) IRowSnapshotHeader {
> +     uint32_t snap_magic;
> +     char id_str[128];
> +     char name[256];
> +     char btmp_file[MAX_FILE_NAME_LENGTH];
> +     char irvd_file[MAX_FILE_NAME_LENGTH];
> +     char father_btmp_file[MAX_FILE_NAME_LENGTH];
> +     uint32_t vm_state_size;
> +     uint32_t date_sec;
> +     uint32_t date_nsec;
> +     uint64_t vm_clock_nsec;
> +     uint32_t nb_children;
> +     uint32_t is_deleted;
> +} IRowSnapshotHeader;

Hi,
This seems to be the code for the following paper:
http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6413673&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F6412851%2F6413550%2F06413673.pdf%3Farnumber%3D6413673

I don't have an IEEE subscription so I can't read the paper.

Here is my understanding of the iROW file format:

1. The metafile

The metafile is the main file which contains the basic disk image
information (virtual size, cluster size) in a header structure.

After the header is a list of snapshots.  Each snapshot has an
allocation bitmap file and a data cluster file.  The snapshot also
points to its parent snapshot so that the snapshot chain can be
traversed.

Snapshots are not deleted from disk, instead they are simply marked as
deleted.  This is necessary so that child snapshots continue to function
after a parent is deleted.

The header contains a "copy_on_demand" bit which is similar to QEMU's
built-in copy-on-read feature.  It can be used to populate the leaf
snapshot with image data from its parents.

2. The bitmap file

The bitmap file contains the allocation bitmap of the snapshot.  A bit
is set if the cluster has been written to and clear otherwise.

3. The data file

This is a sparse raw file containing data clusters written to this
snapshot.

---

If this is a new file format (not used in existing hypervisors or tools)
then I'm afraid it is mostly duplicated code:

The core of iROW is a raw image file with an allocation bitmap, this is
the same as Dong Xu Wang's add-cow file format:
http://comments.gmane.org/gmane.comp.emulators.qemu/183210

iROW implements a fake internal snapshots interface.  In other words, it
reimplements backing files inside the block driver and presents it as
internal snapshots.  However, we get none of the benefits of real
internal snapshots:

 * Deleting snapshots is not possible if child snapshots still exist, so
   you don't reclaim space used by inaccessible data clusters.
 * Reads must traverse the backing file chain instead of O(1) access.
 * Relies on inefficient data copy to bring parent data up into the
   leaf snapshot (QEMU's copy-on-read/block_stream/block_commit aka
   "copy_on_demand" in iROW).

Implementing the fake internal snapshots interface wasn't necessary
since QEMU already supports external snapshots as well as copy-on-read.

Please let me know if I missed something or if this summary seems unfair.

I think the next step should be for Dong Xu and Jingsheng to put
together a final "raw with allocation bitmap" format that can be merged
into qemu.git.  Perhaps run benchmarks to see which implementation,
add-cow or iROW, is fastest.  If iROW is faster, please strip out the
fake internal snapshots functionality and just submit the core bitmap
functionality using bs->backing_hd instead of reimplementing backing
files.

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]