qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v6 07/11] qcow2: Do not perform potentially dama


From: Kevin Wolf
Subject: Re: [Qemu-devel] [PATCH v6 07/11] qcow2: Do not perform potentially damaging repairs
Date: Tue, 21 Oct 2014 09:52:30 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

Am 20.10.2014 um 16:35 hat Max Reitz geschrieben:
> If a referenced cluster has a refcount of 0, increasing its refcount may
> result in clusters being allocated for the refcount structures. This may
> overwrite the referenced cluster, therefore we cannot simply increase
> the refcount then.
> 
> In such cases, we can either try to replicate all the refcount
> operations solely for the check operation, basing the allocations on the
> in-memory refcount table; or we can simply rebuild the whole refcount
> structure based on the in-memory refcount table. Since the latter will
> be much easier, do that.
> 
> To prepare for this, introduce a "rebuild" boolean which should be set
> to true whenever a fix is rather dangerous or too complicated using the
> current refcount structures. Another example for this is refcount blocks
> being referenced more than once.
> 
> Signed-off-by: Max Reitz <address@hidden>
> Reviewed-by: BenoƮt Canet <address@hidden>

> @@ -1557,6 +1442,7 @@ static int check_refblocks(BlockDriverState *bs, 
> BdrvCheckResult *res,
>              fprintf(stderr, "ERROR refcount block %" PRId64 " is not "
>                  "cluster aligned; refcount table entry corrupted\n", i);
>              res->corruptions++;
> +            *rebuild = true;
>              continue;
>          }
>  
> @@ -1612,6 +1498,7 @@ static int check_refblocks(BlockDriverState *bs, 
> BdrvCheckResult *res,
>  
>  resize_fail:
>                  res->corruptions++;
> +                *rebuild = true;
>                  fprintf(stderr, "ERROR could not resize image: %s\n",
>                          strerror(-ret));
>              } else {

Increasing res->corruptions in this patch looks right because you don't
actually rebuild anything yet. However, at the end of the series this
code still looks the same - shouldn't we somehow convert those the
corruptions caused by wrong refcounts into res->corruptions_fixed?

Hm... It seems that you do this by storing intermediate results in
qcow2_check_refcounts(), which assumes that compare_refcounts() finds
only refcount problems, and no other place can find any. This may be
true, but wouldn't it be more elegant to have a separate counter for
corruptions that will be fixed with a rebuild? We can use some
Qcow2CheckResult instead of BdrvCheckResult internally and add more
fields there.

Also, I don't think all "ERROR" messages correctly say "Repairing" again
at the end of the series.

> @@ -1624,40 +1511,10 @@ resize_fail:
>              inc_refcounts(bs, res, *refcount_table, *nb_clusters,
>                  offset, s->cluster_size);
>              if ((*refcount_table)[cluster] != 1) {
> -                fprintf(stderr, "%s refcount block %" PRId64
> -                    " refcount=%d\n",
> -                    fix & BDRV_FIX_ERRORS ? "Repairing" :
> -                                            "ERROR",
> -                    i, (*refcount_table)[cluster]);
> -
> -                if (fix & BDRV_FIX_ERRORS) {
> -                    int64_t new_offset;
> -
> -                    new_offset = realloc_refcount_block(bs, i, offset);
> -                    if (new_offset < 0) {
> -                        res->corruptions++;
> -                        continue;
> -                    }
> -
> -                    /* update refcounts */
> -                    if ((new_offset >> s->cluster_bits) >= *nb_clusters) {
> -                        /* increase refcount_table size if necessary */
> -                        int old_nb_clusters = *nb_clusters;
> -                        *nb_clusters = (new_offset >> s->cluster_bits) + 1;
> -                        *refcount_table = g_renew(uint16_t, *refcount_table,
> -                                                  *nb_clusters);
> -                        memset(&(*refcount_table)[old_nb_clusters], 0,
> -                               (*nb_clusters - old_nb_clusters) *
> -                               sizeof(uint16_t));
> -                    }
> -                    (*refcount_table)[cluster]--;
> -                    inc_refcounts(bs, res, *refcount_table, *nb_clusters,
> -                            new_offset, s->cluster_size);
> -
> -                    res->corruptions_fixed++;
> -                } else {
> -                    res->corruptions++;
> -                }
> +                fprintf(stderr, "ERROR refcount block %" PRId64
> +                        " refcount=%d\n", i, (*refcount_table)[cluster]);
> +                res->corruptions++;
> +                *rebuild = true;
>              }
>          }
>      }
> @@ -1669,8 +1526,8 @@ resize_fail:
>   * Calculates an in-memory refcount table.
>   */
>  static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
> -                               BdrvCheckMode fix, uint16_t **refcount_table,
> -                               int64_t *nb_clusters)
> +                               BdrvCheckMode fix, bool *rebuild,
> +                               uint16_t **refcount_table, int64_t 
> *nb_clusters)
>  {
>      BDRVQcowState *s = bs->opaque;
>      int64_t i;
> @@ -1713,7 +1570,7 @@ static int calculate_refcounts(BlockDriverState *bs, 
> BdrvCheckResult *res,
>          s->refcount_table_offset,
>          s->refcount_table_size * sizeof(uint64_t));
>  
> -    return check_refblocks(bs, res, fix, refcount_table, nb_clusters);
> +    return check_refblocks(bs, res, fix, rebuild, refcount_table, 
> nb_clusters);
>  }
>  
>  /*
> @@ -1721,7 +1578,8 @@ static int calculate_refcounts(BlockDriverState *bs, 
> BdrvCheckResult *res,
>   * refcount as reported by the refcount structures on-disk.
>   */
>  static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
> -                              BdrvCheckMode fix, int64_t *highest_cluster,
> +                              BdrvCheckMode fix, bool *rebuild,
> +                              int64_t *highest_cluster,
>                                uint16_t *refcount_table, int64_t nb_clusters)
>  {
>      BDRVQcowState *s = bs->opaque;
> @@ -1748,10 +1606,15 @@ static void compare_refcounts(BlockDriverState *bs, 
> BdrvCheckResult *res,
>              int *num_fixed = NULL;
>              if (refcount1 > refcount2 && (fix & BDRV_FIX_LEAKS)) {
>                  num_fixed = &res->leaks_fixed;
> -            } else if (refcount1 < refcount2 && (fix & BDRV_FIX_ERRORS)) {
> +            } else if (refcount1 > 0 && refcount1 < refcount2 &&
> +                       (fix & BDRV_FIX_ERRORS)) {
>                  num_fixed = &res->corruptions_fixed;
>              }
>  
> +            if (refcount1 == 0) {
> +                *rebuild = true;
> +            }

Why isn't this just another else if branch? Possibly even as the first
or second condition, so that you don't have to add 'refcount1 > 0' for
the other branch.

> +
>              fprintf(stderr, "%s cluster %" PRId64 " refcount=%d 
> reference=%d\n",
>                     num_fixed != NULL     ? "Repairing" :
>                     refcount1 < refcount2 ? "ERROR" :
> @@ -1790,6 +1653,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, 
> BdrvCheckResult *res,
>      BDRVQcowState *s = bs->opaque;
>      int64_t size, highest_cluster, nb_clusters;
>      uint16_t *refcount_table = NULL;
> +    bool rebuild = false;
>      int ret;
>  
>      size = bdrv_getlength(bs->file);
> @@ -1807,14 +1671,19 @@ int qcow2_check_refcounts(BlockDriverState *bs, 
> BdrvCheckResult *res,
>      res->bfi.total_clusters =
>          size_to_clusters(s, bs->total_sectors * BDRV_SECTOR_SIZE);
>  
> -    ret = calculate_refcounts(bs, res, fix, &refcount_table, &nb_clusters);
> +    ret = calculate_refcounts(bs, res, fix, &rebuild, &refcount_table,
> +                              &nb_clusters);
>      if (ret < 0) {
>          goto fail;
>      }
>  
> -    compare_refcounts(bs, res, fix, &highest_cluster, refcount_table,
> +    compare_refcounts(bs, res, fix, &rebuild, &highest_cluster, 
> refcount_table,
>                        nb_clusters);
>  
> +    if (rebuild) {
> +        fprintf(stderr, "ERROR need to rebuild refcount structures\n");
> +    }

Should this increate res->check_errors? (It doesn't survive until the
end of the series anyway, but more places seem to be added later that
print an error message without increasing the error count.)

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]