qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format


From: Blue Swirl
Subject: Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format
Date: Tue, 7 Sep 2010 19:25:54 +0000

On Mon, Sep 6, 2010 at 10:04 AM, Stefan Hajnoczi
<address@hidden> wrote:
> QEMU Enhanced Disk format is a disk image format that forgoes features
> found in qcow2 in favor of better levels of performance and data
> integrity.  Due to its simpler on-disk layout, it is possible to safely
> perform metadata updates more efficiently.
>
> Installations, suspend-to-disk, and other allocation-heavy I/O workloads
> will see increased performance due to fewer I/Os and syncs.  Workloads
> that do not cause new clusters to be allocated will perform similar to
> raw images due to in-memory metadata caching.
>
> The format supports sparse disk images.  It does not rely on the host
> filesystem holes feature, making it a good choice for sparse disk images
> that need to be transferred over channels where holes are not supported.
>
> Backing files are supported so only deltas against a base image can be
> stored.
>
> The file format is extensible so that additional features can be added
> later with graceful compatibility handling.
>
> Internal snapshots are not supported.  This eliminates the need for
> additional metadata to track copy-on-write clusters.

It would be nice to support external snapshots, so another file
besides the disk images can store the snapshots. Then snapshotting
would be available even with raw or QED disk images. This is of course
not QED specific.

> + *
> + * +--------+----------+----------+----------+-----+
> + * | header | L1 table | cluster0 | cluster1 | ... |
> + * +--------+----------+----------+----------+-----+
> + *
> + * There is a 2-level pagetable for cluster allocation:
> + *
> + *                     +----------+
> + *                     | L1 table |
> + *                     +----------+
> + *                ,------'  |  '------.
> + *           +----------+   |    +----------+
> + *           | L2 table |  ...   | L2 table |
> + *           +----------+        +----------+
> + *       ,------'  |  '------.
> + *  +----------+   |    +----------+
> + *  |   Data   |  ...   |   Data   |
> + *  +----------+        +----------+
> + *
> + * The L1 table is fixed size and always present.  L2 tables are allocated on
> + * demand.  The L1 table size determines the maximum possible image size; it
> + * can be influenced using the cluster_size and table_size values.

The formula for calculating the maximum size would be nice. Is the
image_size the limit? How many clusters can there be? What happens if
the image_size is not equal to multiple of cluster size? Wouldn't
image_size be redundant if cluster_size and table_size determine the
image size?

> + *
> + * All fields are little-endian on disk.
> + */
> +
> +typedef struct {
> +    uint32_t magic;                 /* QED */
> +
> +    uint32_t cluster_size;          /* in bytes */

Doesn't cluster_size need to be a power of two?

> +    uint32_t table_size;            /* table size, in clusters */
> +    uint32_t first_cluster;         /* first usable cluster */

This introduces some limits to the location of first cluster, with 4k
clusters it must reside within the first 16TB. I guess it doesn't
matter.

> +
> +    uint64_t features;              /* format feature bits */
> +    uint64_t compat_features;       /* compatible feature bits */
> +    uint64_t l1_table_offset;       /* L1 table offset, in bytes */
> +    uint64_t image_size;            /* total image size, in bytes */
> +
> +    uint32_t backing_file_offset;   /* in bytes from start of header */
> +    uint32_t backing_file_size;     /* in bytes */
> +    uint32_t backing_fmt_offset;    /* in bytes from start of header */
> +    uint32_t backing_fmt_size;      /* in bytes */
> +} QEDHeader;
> +
> +typedef struct {
> +    uint64_t offsets[0];            /* in bytes */
> +} QEDTable;

Is this for both L1 and L2 tables?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]