[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v2 01/12] qcow2: Add new overlap check functions
From: |
Eric Blake |
Subject: |
Re: [Qemu-devel] [PATCH v2 01/12] qcow2: Add new overlap check functions |
Date: |
Tue, 03 Feb 2015 16:08:26 -0700 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 |
On 11/24/2014 08:56 AM, Max Reitz wrote:
> The existing qcow2 metadata overlap detection function used existing
> structures to determine the location of the image metadata, from plain
> fields such as l1_table_offset and l1_size in the BDRVQcowState, over
> image structures in memory such as the L1 table for the L2 tables'
> positions, or it even read the required data directly from disk for
> every requested check, such as the snapshot L1 tables for the inactive
> L2 tables' positions.
>
> These new functions instead keep a dedicated structure for keeping track
> of the metadata positions in memory. It consists of two parts: First,
> there is one structure which is basically a list of all metadata
> structures. Each entry has a bitmask of types (because some metadata
> structures may actually overlap, such as active and inactive L2 tables),
> a number of clusters occupied and the offset from the previous entry in
> clusters. This structure requires relatively few memory, but checking a
s/few/little/
> certain point may take relatively long. Each entry is called a
> "fragment".
>
> Therefore, there is another representation which is a bitmap, or rather
> a bytemap, of metadata types. The previously described list is split
> into multiple windows with each describing a constant number of clusters
> (WINDOW_SIZE). If the list is to be queried or changed, the respective
> window is selected in constant time and the bitmap is generated from the
> fragments belonging to the window. This bitmap can then be queried in
> constant time and easily be changed.
>
> Because the bitmap representation requires more memory, it is only used
> as a cache. Whenever a window is removed from the cache, the fragment
> list will be rebuilt from the bitmap if the latter has been modified.
> Therefore, the fragment list is only used as the background
> representation to save memory, whereas the bitmap is used whenever
> possible.
>
> Regarding the size of the fragment list in memory: As one refcount block
> can handle cluster_size / 2 entries and one L2 table can handle
> cluster_size / 8 entries, for a qcow2 image with the standard cluster
> size of 64 kB, there is a ratio of data to metadata of about 1/6000
> (1/32768 refblocks and 1/8192 L2 tables) if one ignores the fact that
> not every cluster requires an L2 table entry. The refcount table and the
> L1 table is generally negligible. At the worst, each metadata cluster
> requires its own entry in the fragment list; each entry takes up four
> bytes, therefore, at the worst, the fragment list should take up (for an
> image with 64 kB clusters) (4 B) / (64 kB * 6000) of the image size,
> which is about 1.e-8 (i.e., 11 kB for a 1 TB image, or 11 MB for a 1 PB
> image).
>
> Signed-off-by: Max Reitz <address@hidden>
> ---
> block/Makefile.objs | 3 +-
> block/qcow2-overlap.c | 404
> ++++++++++++++++++++++++++++++++++++++++++++++++++
> block/qcow2.h | 13 ++
> 3 files changed, 419 insertions(+), 1 deletion(-)
> create mode 100644 block/qcow2-overlap.c
Are you still hoping to get this in 2.3?
> +++ b/block/qcow2-overlap.c
> @@ -0,0 +1,404 @@
> +/*
> + * QCOW2 runtime metadata overlap detection
> + *
> + * Copyright (c) 2014 Max Reitz <address@hidden>
Slow review means it is now 2015.
> +/* Number of clusters which are covered by each metadata window;
> + * note that this may not exceed 2^16 as long as
> + * Qcow2MetadataFragment::relative_start is a uint16_t */
> +#define WINDOW_SIZE 4096
So this says that for every 4096 clusters, we have one bytemap
representation, as well as a chain of up to 4096 fragment descriptors?
> +
> +/* Describes a fragment of a or a whole metadata range; does not necessarily
s/of a or/of or/
> + * describe the whole range because it needs to be split on window
> boundaries */
> +typedef struct Qcow2MetadataFragment {
> + /* Bitmask of QCow2MetadataOverlap values */
> + uint8_t types;
> + uint8_t nb_clusters;
> + /* Number of clusters between the start of the window and this range */
> + uint16_t relative_start;
So even I have a file with 4096 consecutive sectors all tied up in the
same purpose within a given window, I have to represent it as 16
Qcow2MetadataFragments rather than 1 fragment, because the uint8_t size
of nb_clusters limits me to at most 256 clusters per Fragment entry?
And worst case, a window will have a list of 4096 fragments if every
cluster alternates between some other type.
> +} QEMU_PACKED Qcow2MetadataFragment;
Is QEMU_PACKED really essential here? If I'm not mistaken, this struct
is only ever kept in memory and not written out to disk. On the other
hand, I understand that you are trying to ensure that the compiler
packed this into 32 bits rather than injecting any padding. Would a
BUG_ON(sizeof(Qcow2MetadataFragment) == 4) be any better at representing
that fact?
> +
> +typedef struct Qcow2MetadataWindow {
> + Qcow2MetadataFragment *fragments;
> + int nb_fragments, fragments_array_size;
> +
> + /* If not NULL, this is an expanded version of the "RLE" version given by
> + * the fragments array; there are WINDOW_SIZE entries */
> + uint8_t *bitmap;
> + bool bitmap_modified;
> +
> + /* Time of last access */
> + unsigned age;
> +
> + /* Index in Qcow2MetadataList::cached_windows */
> + int cached_windows_index;
> +} Qcow2MetadataWindow;
> +
> +struct Qcow2MetadataList {
> + Qcow2MetadataWindow *windows;
> + uint64_t nb_windows;
> +
> + unsigned current_age;
> +
> + /* Index into the windows array */
> + int *cached_windows;
> + size_t nb_cached_windows;
> +};
Is there a maximum size for nb_cached_windows before you start evicting
cached windows?
> +
> +/**
> + * Destroys the cached window bitmap. If it has been modified, the fragment
> list
> + * will be rebuilt accordingly.
> + */
> +static void destroy_window_bitmap(Qcow2MetadataList *mdl,
> + Qcow2MetadataWindow *window)
> +{
> + if (!window->bitmap) {
> + return;
> + }
> +
> + if (window->bitmap_modified) {
> + int bitmap_i, fragment_i = 0;
> + QCow2MetadataOverlap current_types = 0;
> + int current_nb_clusters = 0;
> +
> + /* Rebuild the fragment list; the case bitmap_i == WINDOW_SIZE is for
> + * entering the last fragment at the bitmap end */
> +
> + for (bitmap_i = 0; bitmap_i <= WINDOW_SIZE; bitmap_i++) {
> + /* Qcow2MetadataFragment::nb_clusters is a uint8_t, so
> + * current_nb_clusters may not exceed 255 */
Wait. Why 255 and not 256? Can't you use nb_clusters==0 as a modulo for
256 consecutive clusters, as the fragments should never encode a
0-length run? That way, you can represent 4096 consecutive clusters in
16 fragments of 256 each, instead of 17 (16 of 255, and 1 of 16).
> + if (bitmap_i < WINDOW_SIZE &&
> + current_types == window->bitmap[bitmap_i] &&
> + current_nb_clusters < 255)
> + {
> + current_nb_clusters++;
> + } else {
> + if (current_types && current_nb_clusters) {
> + if (fragment_i >= window->fragments_array_size) {
> + window->fragments_array_size =
> + 3 * window->fragments_array_size / 2 + 1;
> +
> + /* new_nb_fragments should be small enough, and
> there is
> + * nothing we can do on failure anyway, so do not use
> + * g_try_renew() here */
> + window->fragments =
> + g_renew(Qcow2MetadataFragment, window->fragments,
> + window->fragments_array_size);
> + }
> +
> + window->fragments[fragment_i++] =
> (Qcow2MetadataFragment){
> + .types = current_types,
> + .nb_clusters = current_nb_clusters,
> + .relative_start = bitmap_i - current_nb_clusters,
> + };
> + }
> +
> + current_nb_clusters = 0;
> + if (bitmap_i < WINDOW_SIZE) {
> + current_types = window->bitmap[bitmap_i];
> + }
> + }
> + }
> +
> + window->nb_fragments = fragment_i;
Any need to clear window->bitmap_modified at this point? Or are you
careful to only rely on it when window->bitmap is non-NULL.
> + }
> +
> + g_free(window->bitmap);
> + window->bitmap = NULL;
> +}
> +
> +/**
> + * Creates a bitmap from the fragment list.
> + */
> +static void build_window_bitmap(Qcow2MetadataList *mdl,
> + Qcow2MetadataWindow *window)
> +{
> + int cache_i, oldest_cache_i = -1, i;
> + unsigned oldest_cache_age = 0;
> +
> + for (cache_i = 0; cache_i < mdl->nb_cached_windows; cache_i++) {
> + unsigned age;
> +
> + if (mdl->cached_windows[cache_i] < 0) {
> + break;
> + }
> +
> + age = mdl->current_age -
> mdl->windows[mdl->cached_windows[cache_i]].age;
> + if (age > oldest_cache_age) {
> + oldest_cache_age = age;
> + oldest_cache_i = cache_i;
> + }
> + }
> +
> + if (cache_i >= mdl->nb_cached_windows) {
> + destroy_window_bitmap(mdl,
> + &mdl->windows[mdl->cached_windows[oldest_cache_i]]);
> + cache_i = oldest_cache_i;
> + }
> +
> + assert(cache_i >= 0);
> + mdl->cached_windows[cache_i] = window - mdl->windows;
> + window->cached_windows_index = cache_i;
> +
> + window->age = mdl->current_age++;
> +
> + window->bitmap = g_new0(uint8_t, WINDOW_SIZE);
> +
> + for (i = 0; i < window->nb_fragments; i++) {
> + Qcow2MetadataFragment *fragment = &window->fragments[i];
> +
> + memset(&window->bitmap[fragment->relative_start], fragment->types,
> + fragment->nb_clusters);
Hmm. If you do use my idea of nb_clusters==0 for 256, this needs a
special case. Another option would be storing number of clusters - 1 (so
a value of 0 is 1 cluster, a value of 255 is 256 clusters).
> + }
> +
> + window->bitmap_modified = false;
> +}
> +
> +/**
> + * Enters a new range into the metadata list.
> + */
> +void qcow2_metadata_list_enter(BlockDriverState *bs, uint64_t offset,
> + int nb_clusters, QCow2MetadataOverlap types)
> +{
> + BDRVQcowState *s = bs->opaque;
> + uint64_t start_cluster = offset >> s->cluster_bits;
> + uint64_t end_cluster = start_cluster + nb_clusters;
> + uint64_t current_cluster = start_cluster;
> +
> + types &= s->overlap_check;
> + if (!types) {
> + return;
> + }
> +
> + if (offset_into_cluster(s, offset)) {
> + /* Do not enter apparently broken metadata ranges */
> + return;
> + }
> +
> + while (current_cluster < end_cluster) {
> + int bitmap_i;
> + int bitmap_i_start = current_cluster % WINDOW_SIZE;
> + int bitmap_i_end = MIN(WINDOW_SIZE,
> + end_cluster - current_cluster +
> bitmap_i_start);
> + uint64_t window_i = current_cluster / WINDOW_SIZE;
> + Qcow2MetadataWindow *window;
> +
> + if (window_i >= s->metadata_list->nb_windows) {
> + /* This should not be happening too often, so it is fine to
> resize
> + * the array to exactly the required size */
> + Qcow2MetadataWindow *new_windows;
> +
> + new_windows = g_try_renew(Qcow2MetadataWindow,
> + s->metadata_list->windows,
> + window_i + 1);
> + if (!new_windows) {
> + return;
> + }
> +
> + memset(new_windows + s->metadata_list->nb_windows, 0,
> + (window_i + 1 - s->metadata_list->nb_windows) *
> + sizeof(Qcow2MetadataWindow));
> +
> + s->metadata_list->windows = new_windows;
> + s->metadata_list->nb_windows = window_i + 1;
> + }
> +
> + window = &s->metadata_list->windows[window_i];
> + if (!window->bitmap) {
> + build_window_bitmap(s->metadata_list, window);
> + }
> +
> + for (bitmap_i = bitmap_i_start; bitmap_i < bitmap_i_end; bitmap_i++)
> {
> + window->bitmap[bitmap_i] |= types;
This adds in new types but keeps existing types listed. Is that okay?
> + }
> +
> + window->age = s->metadata_list->current_age++;
> + window->bitmap_modified = true;
> +
> + /* Go to the next window */
> + current_cluster += WINDOW_SIZE - bitmap_i_start;
> + }
> +}
...
> +++ b/block/qcow2.h
> @@ -159,6 +159,9 @@ typedef struct QCowSnapshot {
> struct Qcow2Cache;
> typedef struct Qcow2Cache Qcow2Cache;
>
> +struct Qcow2MetadataList;
> +typedef struct Qcow2MetadataList Qcow2MetadataList;
> +
> typedef struct Qcow2UnknownHeaderExtension {
> uint32_t magic;
> uint32_t len;
> @@ -261,6 +264,7 @@ typedef struct BDRVQcowState {
>
> bool discard_passthrough[QCOW2_DISCARD_MAX];
>
> + Qcow2MetadataList *metadata_list;
> int overlap_check; /* bitmask of Qcow2MetadataOverlap values */
> bool signaled_corruption;
>
> @@ -576,4 +580,13 @@ int qcow2_cache_get_empty(BlockDriverState *bs,
> Qcow2Cache *c, uint64_t offset,
> void **table);
> int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
>
> +/* qcow2-overlap.c functions */
> +int qcow2_create_empty_metadata_list(BlockDriverState *bs, size_t cache_size,
> + Error **errp);
> +void qcow2_metadata_list_destroy(BlockDriverState *bs);
> +void qcow2_metadata_list_enter(BlockDriverState *bs, uint64_t offset,
> + int nb_clusters, QCow2MetadataOverlap type);
> +void qcow2_metadata_list_remove(BlockDriverState *bs, uint64_t offset,
> + int nb_clusters, QCow2MetadataOverlap type);
> +
> #endif
>
Looks reasonable in general.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature
- Re: [Qemu-devel] [PATCH v2 01/12] qcow2: Add new overlap check functions,
Eric Blake <=