[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] Re: [PATCH v4 1/5] docs: Add QED image format specification
From: |
Kevin Wolf |
Subject: |
[Qemu-devel] Re: [PATCH v4 1/5] docs: Add QED image format specification |
Date: |
Fri, 12 Nov 2010 14:58:12 +0100 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.15) Gecko/20101027 Fedora/3.0.10-1.fc12 Thunderbird/3.0.10 |
Am 28.10.2010 13:01, schrieb Stefan Hajnoczi:
> Signed-off-by: Stefan Hajnoczi <address@hidden>
> ---
> docs/specs/qed_spec.txt | 128
> +++++++++++++++++++++++++++++++++++++++++++++++
> 1 files changed, 128 insertions(+), 0 deletions(-)
> create mode 100644 docs/specs/qed_spec.txt
>
> diff --git a/docs/specs/qed_spec.txt b/docs/specs/qed_spec.txt
> new file mode 100644
> index 0000000..e4425c8
> --- /dev/null
> +++ b/docs/specs/qed_spec.txt
> @@ -0,0 +1,128 @@
> +=Specification=
> +
> +The file format looks like this:
> +
> + +----------+----------+----------+-----+
> + | cluster0 | cluster1 | cluster2 | ... |
> + +----------+----------+----------+-----+
> +
> +The first cluster begins with the '''header'''. The header contains
> information about where regular clusters start; this allows the header to be
> extensible and store extra information about the image file. A regular
> cluster may be a '''data cluster''', an '''L2''', or an '''L1 table'''. L1
> and L2 tables are composed of one or more contiguous clusters.
> +
> +Normally the file size will be a multiple of the cluster size. If the file
> size is not a multiple, extra information after the last cluster may not be
> preserved if data is written. Legitimate extra information should use space
> between the header and the first regular cluster.
> +
> +All fields are little-endian.
> +
> +==Header==
> + Header {
> + uint32_t magic; /* QED\0 */
> +
> + uint32_t cluster_size; /* in bytes */
> + uint32_t table_size; /* for L1 and L2 tables, in clusters */
> + uint32_t header_size; /* in clusters */
> +
> + uint64_t features; /* format feature bits */
> + uint64_t compat_features; /* compat feature bits */
> + uint64_t l1_table_offset; /* in bytes */
> + uint64_t image_size; /* total logical image size, in bytes */
> +
> + /* if (features & QED_F_BACKING_FILE) */
> + uint32_t backing_filename_offset; /* in bytes from start of header */
> + uint32_t backing_filename_size; /* in bytes */
> + }
> +
> +Field descriptions:
> +* ''cluster_size'' must be a power of 2 in range [2^12, 2^26].
> +* ''table_size'' must be a power of 2 in range [1, 16].
> +* ''header_size'' is the number of clusters used by the header and any
> additional information stored before regular clusters.
> +* ''features'', ''compat_features'', and ''autoclear_features'' are file
> format extension bitmaps. They work as follows:
> +** An image with unknown ''features'' bits enabled must not be opened. File
> format changes that are not backwards-compatible must use ''features'' bits.
> +** An image with unknown ''compat_features'' bits enabled can be opened
> safely. The unknown features are simply ignored and represent
> backwards-compatible changes to the file format.
> +** An image with unknown ''autoclear_features'' bits enable can be opened
> safely after clearing the unknown bits. This allows for backwards-compatible
> changes to the file format which degrade gracefully and can be re-enabled
> again by a new program later.
autoclear features aren't even part of the header in the spec.
> +* ''l1_table_offset'' is the offset of the first byte of the L1 table in the
> image file and must be a multiple of ''cluster_size''.
> +* ''image_size'' is the block device size seen by the guest and must be a
> multiple of 512 bytes.
> +* ''backing_filename'' is a string in (byte offset, byte size) form. It is
> not NUL-terminated and has no alignment constraints.
> +
> +Feature bits:
> +* QED_F_BACKING_FILE = 0x01. The image uses a backing file. The backing
> filename string is given in the ''backing_filename_{offset,size}'' fields and
> may be an absolute path or relative to the image file.
> +* QED_F_NEED_CHECK = 0x02. The image needs a consistency check before use.
> +* QED_F_BACKING_FORMAT_NO_PROBE = 0x04. The backing file is a raw disk
> image and no file format autodetection should be attempted. This should be
> used to ensure that raw backing images are never detected as an image format
> if they happen to contain magic constants.
> +
> +There are currently no defined ''compat_features'' or ''autoclear_features''
> bits.
> +
> +Fields predicated on a feature bit are only used when that feature is set.
> The fields always take up header space, regardless of whether or not the
> feature bit is set.
> +
> +==Tables==
> +
> +Tables provide the translation from logical offsets in the block device to
> cluster offsets in the file.
> +
> + #define TABLE_NOFFSETS (table_size * cluster_size / sizeof(uint64_t))
> +
> + Table {
> + uint64_t offsets[TABLE_NOFFSETS];
> + }
> +
> +The tables are organized as follows:
> +
> + +----------+
> + | L1 table |
> + +----------+
> + ,------' | '------.
> + +----------+ | +----------+
> + | L2 table | ... | L2 table |
> + +----------+ +----------+
> + ,------' | '------.
> + +----------+ | +----------+
> + | Data | ... | Data |
> + +----------+ +----------+
> +
> +A table is made up of one or more contiguous clusters. The table_size
> header field determines table size for an image file. For example,
> cluster_size=64 KB and table_size=4 results in 256 KB tables.
> +
> +The logical image size must be less than or equal to the maximum possible
> size of clusters rooted by the L1 table:
> + header.image_size <= TABLE_NOFFSETS * TABLE_NOFFSETS * header.cluster_size
> +
> +All offsets in L1 and L2 tables are cluster-aligned. The least significant
> bits up to ''cluster_size'' are reserved and must be zero.
I know what you mean here, but the text leaves things a bit unclear.
First I would expect a bit number instead of a byte offset for "bits up
to x". Second, cluster_size is the first bit not reserved, whereas your
description sounds to me as if it included cluster_size.
> This may be used in future format extensions to store per-offset information.
> +
> +The following offsets have special meanings:
> +
> +===L2 table offsets===
> +* 0 - unallocated. The L2 table is not yet allocated.
> +
> +===Data cluster offsets===
> +* 0 - unallocated. The data cluster is not yet allocated.
> +
> +===Unallocated L2 tables and data clusters===
> +Reads to an unallocated area of the image file access the backing file. If
> there is no backing file, then zeroes are produced. The backing file may be
> smaller than the image file and reads of unallocated areas beyond the end of
> the backing file produce zeroes.
> +
> +Writes to an unallocated area cause a new data clusters to be allocated, and
> a new L2 table if that is also unallocated. The new data cluster is
> populated with data from the backing image (or zeroes if no backing image)
> and the data being written.
> +
> +===Logical offset translation===
> +Logical offsets are translated into cluster offsets as follows:
> +
> + table_bits table_bits cluster_bits
> + <--------> <--------> <--------------->
> + +----------+----------+-----------------+
> + | L1 index | L2 index | byte offset |
> + +----------+----------+-----------------+
> +
> + Structure of a logical offset
> +
> + offset_mask = ~(cluster_size - 1) # mask for the image file byte offset
> +
> + def logical_to_cluster_offset(l1_index, l2_index, byte_offset):
> + l2_offset = l1_table[l1_index]
> + l2_table = load_table(l2_offset)
> + cluster_offset = l2_table[l2_index] & offset_mask
> + return cluster_offset + byte_offset
> +
> +==Consistency checking==
> +
> +This section is informational and included to provide background on the use
> of the QED_F_NEED_CHECK ''features'' bit.
> +
> +The QED_F_NEED_CHECK bit is used to mark an image as dirty before starting
> an operation that could leave the image in an inconsistent state if
> interrupted by a crash or power failure. A dirty image must be checked on
> open because its metadata may not be consistent.
> +
> +Consistency check includes the following invariants:
> +# Each cluster is referenced once and only once. It is an inconsistency to
> have a cluster referenced more than once by L1 or L2 tables. A cluster has
> been leaked if it has no references.
> +# Offsets must be within the image file size and must be ''cluster_size''
> aligned.
> +# Table offsets must at least ''table_size'' * ''cluster_size'' bytes from
> the end of the image file so that there is space for the entire table.
> +
> +The consistency check process starts by from ''l1_table_offset'' and scans
> all L2 tables. After the check completes with no other errors besides leaks,
> the QED_F_NEED_CHECK bit can be cleared and the image can be accessed.
Looks okay otherwise.
Kevin
- [Qemu-devel] Re: [PATCH v4 1/5] docs: Add QED image format specification,
Kevin Wolf <=