Re: [PATCH v3 3/6] block/qcow2: introduce inflight writes counters: fix

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 3/6] block/qcow2: introduce inflight writes counters: fix

From:	Max Reitz
Subject:	Re: [PATCH v3 3/6] block/qcow2: introduce inflight writes counters: fix discard
Date:	Thu, 11 Mar 2021 20:58:44 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0

On 05.03.21 18:35, Vladimir Sementsov-Ogievskiy wrote:

There is a bug in qcow2: host cluster can be discarded (refcount
becomes 0) and reused during data write. In this case data write may
pollute another cluster (recently allocated) or even metadata.

I was about to ask whether we couldn’t somehow let discard requestscheck overlapping I/O with the tracked_request infrastructure fromblock/io.c (or perhaps even just let them be serializing).

But I suppose there may be other reasons for clusters being discardedother than explicit discard requests, so we have to have something inqcow2...

To fix the issue let's track inflight writes to host cluster in the
hash table and consider new counter when discarding and reusing host
clusters.

This didn’t really explain anything that would help me understand what’sgoing on in this patch. The patch itself doesn’t really help me withcomments either. It’s possible that I’m just daft, but honestly I havea hard time really understanding what’s supposed to be going on.

Coming back from jumping all over this patch, I also wonder why thisisn’t split in multiple patches, where the first introduces theinfrastructure and explains exactly what’s going on, and the nextpatches make use of it so I could understand what each check etc. issupposed to represent and do.

To serve as an example, after reading through the patch, I still have noidea how it prevents that discard problem you’re trying to fix. MaybeI’m lazy, but I would have liked exactly that to be explained by thiscommit message. Actually, I don’t even understand the problem just fromthis commit message alone, but I could resort to HEAD^ and somethinking. Not sure if that’s ideal, though.

So the commit message says that “host cluster can be discarded andreused during data write”, which to me just seems like the exactbehavior we want. The second sentence describes a problem, but doesn’tsay how reclaiming discarded clusters leads there.

I suppose what leads to the problem is that there first needs to be abackground write even before the discard that is settled only after there-claiming write (which isn’t mentioned).

As far as I understand, this patch addresses that problem by preventingthe discarded clusters from being allocated while said background writeis not settled yet. This is done by keeping track of all host clustersthat are currently the target of some write operation, and preventingthem from being allocated. OK, makes sense.(This latter part does roughly correspond to the second paragraph ofthis commit message, but I couldn’t make sense of it withoutunderstanding why we’d do it. What’s notably missing from myexplanation is the part that you hinted at with “when discarding”, butmy problem is that that I just don’t understand what that means at all.Perhaps it has to do with how I don’t understand the change toupdate_refcount(), and consequently the whole “slow path” inupdate_inflight_write_cnt().)

Side note: Intuitively, this hash table feels like an unnecessarily bighammer to me, but I can’t think of a better solution either, so. (Imainly don’t like how every write request will result in one allocationand hash table entry per cluster it touches, even when no data clusteris ever discarded.)

Enable qcow2-discard-during-rewrite as it is fixed.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
  block/qcow2.h                                 |   9 ++
  block/qcow2-refcount.c                        | 149 +++++++++++++++++-
  block/qcow2.c                                 |  26 ++-
  .../tests/qcow2-discard-during-rewrite        |   2 +-
  4 files changed, 181 insertions(+), 5 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 0678073b74..e18d8dfbae 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -420,6 +420,8 @@ typedef struct BDRVQcow2State {
       * is to convert the image with the desired compression type set.
       */
      Qcow2CompressionType compression_type;
+
+    GHashTable *inflight_writes_counters;
  } BDRVQcow2State;

typedef struct Qcow2COWRegion {

@@ -896,6 +898,13 @@ int qcow2_shrink_reftable(BlockDriverState *bs);
  int64_t qcow2_get_last_cluster(BlockDriverState *bs, int64_t size);
  int qcow2_detect_metadata_preallocation(BlockDriverState *bs);

+int qcow2_inflight_writes_inc(BlockDriverState *bs, int64_t offset,

+                              int64_t length);
+int qcow2_inflight_writes_dec(BlockDriverState *bs, int64_t offset,
+                              int64_t length);
+int qcow2_inflight_writes_dec_locked(BlockDriverState *bs, int64_t offset,
+                                     int64_t length);
+
  /* qcow2-cluster.c functions */
  int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
                          bool exact_size);
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 8e649b008e..464d133368 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -799,6 +799,140 @@ found:
      }
  }

+/*

+ * Qcow2InFlightRefcount is a type for values of s->inflight_writes_counters
+ * hasm map. And it's keys are cluster indices.


*hash, *its

I’d rather document this for s->inflight_writes_counters, though (like“/* cluster index (int64_t) -> Qcow2InFlightRefcount */”)

+ */
+typedef struct Qcow2InFlightRefcount {
+    /*
+     * Number of in-flight writes to the cluster, always > 0, as when becomes


s/when becomes/when it becomes/

+     * 0 the entry is removed from s->inflight_writes_counters.
+     */
+    uint64_t inflight_writes_cnt;
+
+    /* Cluster refcount is known to be zero */
+    bool refcount_zero;
+
+    /* Cluster refcount was made zero with this discard-type */
+    enum qcow2_discard_type type;
+} Qcow2InFlightRefcount;
+
+static Qcow2InFlightRefcount *find_infl_wr(BDRVQcow2State *s,
+                                           int64_t cluster_index)
+{
+    Qcow2InFlightRefcount *infl;
+
+    if (!s->inflight_writes_counters) {
+        return NULL;
+    }
+
+    infl = g_hash_table_lookup(s->inflight_writes_counters, &cluster_index);
+
+    if (infl) {
+        assert(infl->inflight_writes_cnt > 0);
+    }
+
+    return infl;
+}
+
+/*
+ * Returns true if there are any in-flight writes to the cluster blocking
+ * its reallocation.
+ */
+static bool has_infl_wr(BDRVQcow2State *s, int64_t cluster_index)
+{
+    return !!find_infl_wr(s, cluster_index);


I wonder if g_hash_table_contains() would be quicker. *shrug*

+}
+
+static int update_inflight_write_cnt(BlockDriverState *bs,
+                                     int64_t offset, int64_t length,
+                                     bool decrease, bool locked)
+{
+    BDRVQcow2State *s = bs->opaque;
+    int64_t start, last, cluster_offset;
+
+    if (locked) {
+        qemu_co_mutex_assert_locked(&s->lock);
+    }
+
+    start = start_of_cluster(s, offset);
+    last = start_of_cluster(s, offset + length - 1);
+    for (cluster_offset = start; cluster_offset <= last;
+         cluster_offset += s->cluster_size)
+    {
+        int64_t cluster_index = cluster_offset >> s->cluster_bits;
+        Qcow2InFlightRefcount *infl = find_infl_wr(s, cluster_index);
+
+        if (!infl) {
+            infl = g_new0(Qcow2InFlightRefcount, 1);
+            g_hash_table_insert(s->inflight_writes_counters,
+                                g_memdup(&cluster_index, 
sizeof(cluster_index)),
+                                infl);

I suppose we could save one allocation by putting the cluster index intoQcow2InFlightRefcount and then giving the key as &infl->cluster_index.Not sure if that’s too nasty, though.

+        }
+
+        if (decrease) {
+            assert(infl->inflight_writes_cnt >= 1);
+            infl->inflight_writes_cnt--;
+        } else {
+            infl->inflight_writes_cnt++;
+        }
+
+        if (infl->inflight_writes_cnt == 0) {
+            bool refcount_zero = infl->refcount_zero;
+            enum qcow2_discard_type type = infl->type;
+
+            g_hash_table_remove(s->inflight_writes_counters, &cluster_index);
+
+            if (refcount_zero) {
+                /*
+                 * Slow path. We must reset normal refcount to actually release

I don’t understand what “slow path” means in this context. (Nor do Ireally understand the rest, but more on that below.)

+                 * the cluster.
+                 */
+                int ret;
+
+                if (!locked) {
+                    qemu_co_mutex_lock(&s->lock);
+                }
+                ret = qcow2_update_cluster_refcount(bs, cluster_index, 0,
+                                                    true, type);

I don’t understand this, but then again I’m writing this very early inmy review, and at this point I don’t understand anything, really. (Thecomment doesn’t explain to me what’s happening here.)

Revisiting later, refcount_zero is set by update_refcount() when therefcount drops to zero, so invoking this function here will finalize thechange. I don’t quite understand what’s going on in update_refcount(),though.

(And even after finding out why, I don’t understand why the commentdoesn’t explain why we must invoke qcow2_update_cluster_refcount() withaddend=0.)

+                if (!locked) {
+                    qemu_co_mutex_unlock(&s->lock);
+                }
+
+                if (ret < 0) {
+                    return ret;
+                }
+            }
+        }
+
+    }
+
+    return 0;
+}
+
+int qcow2_inflight_writes_inc(BlockDriverState *bs, int64_t offset,
+                              int64_t length)
+{
+    return update_inflight_write_cnt(bs, offset, length, false, false);
+}
+
+/*
+ * Called with s->lock not locked by caller. Will take s->lock only if need to
+ * release the cluster (refcount is 0 and inflight-write-cnt becomes zero).
+ */


Why doesn’t qcow2_inflight_writes_inc() get the same comment?

...Oh, because @locked doesn’t have an influence there. Why isn’t itworth a comment that *_inc() works both with a locked and an unlocked mutex?

+int qcow2_inflight_writes_dec(BlockDriverState *bs, int64_t offset,
+                              int64_t length)
+{
+    return update_inflight_write_cnt(bs, offset, length, true, false);
+}
+
+/* Called with s->lock locked. */
+int qcow2_inflight_writes_dec_locked(BlockDriverState *bs, int64_t offset,
+                                     int64_t length)
+{
+    return update_inflight_write_cnt(bs, offset, length, true, true);
+}
+
  /* XXX: cache several refcount block clusters ? */
  /* @addend is the absolute value of the addend; if @decrease is set, @addend
   * will be subtracted from the current refcount, otherwise it will be added */
@@ -885,6 +1019,13 @@ static int QEMU_WARN_UNUSED_RESULT 
update_refcount(BlockDriverState *bs,

if (refcount == 0) {

              void *table;
+            Qcow2InFlightRefcount *infl = find_infl_wr(s, cluster_index);
+
+            if (infl) {
+                infl->refcount_zero = true; 
+                infl->type = type;
+                continue;
+            }

I don’t understand what this is supposed to do exactly. It seems likeit wants to keep metadata structures in the cache that are still in use(because dropping them from the caches is what happens next), but usersof metadata structures won’t set in-flight counters for those metadatastructures, will they?

(I also wondered why the continue wasn’t put before thes->set_refcount() call, but I suppose the same effect is had by thehas_infl_wr() check in alloc_clusters_noref() andqcow2_alloc_clusters_at().)

table = qcow2_cache_is_table_offset(s->refcount_block_cache,

                                                  offset);
@@ -983,7 +1124,7 @@ retry:

if (ret < 0) {

              return ret;
-        } else if (refcount != 0) {
+        } else if (refcount != 0 || has_infl_wr(s, next_cluster_index)) {
              goto retry;
          }
      }
@@ -1046,7 +1187,7 @@ int64_t qcow2_alloc_clusters_at(BlockDriverState *bs, 
uint64_t offset,
              ret = qcow2_get_refcount(bs, cluster_index++, &refcount);
              if (ret < 0) {
                  return ret;
-            } else if (refcount != 0) {
+            } else if (refcount != 0 || has_infl_wr(s, cluster_index)) {
                  break;
              }
          }
@@ -2294,7 +2435,9 @@ static int64_t alloc_clusters_imrt(BlockDriverState *bs,
           contiguous_free_clusters < cluster_count;
           cluster++)
      {
-        if (!s->get_refcount(*refcount_table, cluster)) {
+        if (!s->get_refcount(*refcount_table, cluster) &&
+            !has_infl_wr(s, cluster))


Is this really necessary?

+        {
              contiguous_free_clusters++;
              if (first_gap) {
                  /* If this is the first free cluster found, update
diff --git a/block/qcow2.c b/block/qcow2.c
index d9f49a52e7..6ee94421e0 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c


[...]

@@ -4536,13 +4553,20 @@ qcow2_co_pwritev_compressed_task(BlockDriverState *bs,
      }
ret = qcow2_pre_write_overlap_check(bs, 0, cluster_offset, out_len, true);
-    qemu_co_mutex_unlock(&s->lock);
      if (ret < 0) {
+        qemu_co_mutex_unlock(&s->lock);
          goto fail;
      }
+ qcow2_inflight_writes_inc(bs, cluster_offset, out_len);

It was my impression that this function could be called either with orwithout a lock, that it wouldn’t really matter. Well, at least that’swhat I figured out for lack of a comment regarding the contract how itis to be invoked.

Max

+
+    qemu_co_mutex_unlock(&s->lock);
+
      BLKDBG_EVENT(s->data_file, BLKDBG_WRITE_COMPRESSED);
      ret = bdrv_co_pwrite(s->data_file, cluster_offset, out_len, out_buf, 0);
+
+    qcow2_inflight_writes_dec(bs, cluster_offset, out_len);
+
      if (ret < 0) {
          goto fail;
      }

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH v3 0/6] qcow2: compressed write cache, Vladimir Sementsov-Ogievskiy, 2021/03/05
- [PATCH v3 1/6] block-jobs: flush target at the end of .run(), Vladimir Sementsov-Ogievskiy, 2021/03/05
  - Re: [PATCH v3 1/6] block-jobs: flush target at the end of .run(), Max Reitz, 2021/03/11
- [PATCH v3 6/6] block/qcow2: use seqcache for compressed writes, Vladimir Sementsov-Ogievskiy, 2021/03/05
- [PATCH v3 2/6] iotests: add qcow2-discard-during-rewrite, Vladimir Sementsov-Ogievskiy, 2021/03/05
- [PATCH v3 3/6] block/qcow2: introduce inflight writes counters: fix discard, Vladimir Sementsov-Ogievskiy, 2021/03/05
  - Re: [PATCH v3 3/6] block/qcow2: introduce inflight writes counters: fix discard, Max Reitz <=
    - Re: [PATCH v3 3/6] block/qcow2: introduce inflight writes counters: fix discard, Vladimir Sementsov-Ogievskiy, 2021/03/12
    - Re: [PATCH v3 3/6] block/qcow2: introduce inflight writes counters: fix discard, Max Reitz, 2021/03/12
    - Re: [PATCH v3 3/6] block/qcow2: introduce inflight writes counters: fix discard, Vladimir Sementsov-Ogievskiy, 2021/03/12
    - Re: [PATCH v3 3/6] block/qcow2: introduce inflight writes counters: fix discard, Vladimir Sementsov-Ogievskiy, 2021/03/12
    - Re: [PATCH v3 3/6] block/qcow2: introduce inflight writes counters: fix discard, Max Reitz, 2021/03/12
    - Re: [PATCH v3 3/6] block/qcow2: introduce inflight writes counters: fix discard, Vladimir Sementsov-Ogievskiy, 2021/03/12
    - Re: [PATCH v3 3/6] block/qcow2: introduce inflight writes counters: fix discard, Max Reitz, 2021/03/12
    - Re: [PATCH v3 3/6] block/qcow2: introduce inflight writes counters: fix discard, Vladimir Sementsov-Ogievskiy, 2021/03/12
    - Re: [PATCH v3 3/6] block/qcow2: introduce inflight writes counters: fix discard, Max Reitz, 2021/03/12
    - Re: [PATCH v3 3/6] block/qcow2: introduce inflight writes counters: fix discard, Vladimir Sementsov-Ogievskiy, 2021/03/12

Prev by Date: [PATCH v6 2/2] target/s390x: Store r1/r2 for page-translation exceptions during MVPG
Next by Date: Re: [PATCH v3 0/4] semihosting/next (SYS_HEAPINFO)
Previous by thread: [PATCH v3 3/6] block/qcow2: introduce inflight writes counters: fix discard
Next by thread: Re: [PATCH v3 3/6] block/qcow2: introduce inflight writes counters: fix discard
Index(es):
- Date
- Thread