qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 3/6] block/qcow2: introduce inflight writes counters: fix


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [PATCH v3 3/6] block/qcow2: introduce inflight writes counters: fix discard
Date: Fri, 12 Mar 2021 19:03:54 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1

12.03.2021 18:52, Max Reitz wrote:
On 12.03.21 16:24, Vladimir Sementsov-Ogievskiy wrote:
12.03.2021 18:10, Max Reitz wrote:
On 12.03.21 13:46, Vladimir Sementsov-Ogievskiy wrote:
12.03.2021 15:32, Vladimir Sementsov-Ogievskiy wrote:
12.03.2021 14:17, Max Reitz wrote:
On 12.03.21 10:09, Vladimir Sementsov-Ogievskiy wrote:
11.03.2021 22:58, Max Reitz wrote:
On 05.03.21 18:35, Vladimir Sementsov-Ogievskiy wrote:
There is a bug in qcow2: host cluster can be discarded (refcount
becomes 0) and reused during data write. In this case data write may

[..]

@@ -885,6 +1019,13 @@ static int QEMU_WARN_UNUSED_RESULT 
update_refcount(BlockDriverState *bs,
          if (refcount == 0) {
              void *table;
+            Qcow2InFlightRefcount *infl = find_infl_wr(s, cluster_index);
+
+            if (infl) {
+                infl->refcount_zero = true;
+                infl->type = type;
+                continue;
+            }

I don’t understand what this is supposed to do exactly.  It seems like it wants 
to keep metadata structures in the cache that are still in use (because 
dropping them from the caches is what happens next), but users of metadata 
structures won’t set in-flight counters for those metadata structures, will 
they?

Don't follow.

We want the code in "if (refcount == 0)" to be triggered only when full reference count 
of the host cluster becomes 0, including inflight-write-cnt. So, if at this point 
inflight-write-cnt is not 0, we postpone freeing the host cluster, it will be done later from 
"slow path" in update_inflight_write_cnt().

But the code under “if (refcount == 0)” doesn’t free anything, does it?  All I 
can see is code to remove metadata structures from the metadata caches (if the 
discarded cluster was an L2 table or a refblock), and finally the discard on 
the underlying file.  I don’t see how that protocol-level discard has anything 
to do with our problem, though.

Hmm. Still, if we do this discard, and then our in-flight write, we'll have 
data instead of a hole. Not a big deal, but seems better to postpone discard.

On the other hand, clearing caches is OK, as its related only to 
qcow2-refcount, not to inflight-write-cnt


As far as I understand, the freeing happens immediately above the “if (refcount == 
0)” block by s->set_refcount() setting the refcount to 0. (including updating 
s->free_cluster_index if the refcount is 0).

Hmm.. And that (setting s->free_cluster_index) what I should actually prevent 
until total reference count becomes zero.

And about s->set_refcount(): it only update a refcount itself, and don't free 
anything.



So, it is more correct like this:

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 464d133368..1da282446d 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1012,21 +1012,12 @@ static int QEMU_WARN_UNUSED_RESULT 
update_refcount(BlockDriverState *bs,
          } else {
              refcount += addend;
          }
-        if (refcount == 0 && cluster_index < s->free_cluster_index) {
-            s->free_cluster_index = cluster_index;
-        }
          s->set_refcount(refcount_block, block_index, refcount);

          if (refcount == 0) {
              void *table;
              Qcow2InFlightRefcount *infl = find_infl_wr(s, cluster_index);

-            if (infl) {
-                infl->refcount_zero = true;
-                infl->type = type;
-                continue;
-            }
-
              table = qcow2_cache_is_table_offset(s->refcount_block_cache,
                                                  offset);
              if (table != NULL) {
@@ -1040,6 +1031,16 @@ static int QEMU_WARN_UNUSED_RESULT 
update_refcount(BlockDriverState *bs,
                  qcow2_cache_discard(s->l2_table_cache, table);
              }

+            if (infl) {
+                infl->refcount_zero = true;
+                infl->type = type;
+                continue;
+            }
+
+            if (cluster_index < s->free_cluster_index) {
+                s->free_cluster_index = cluster_index;
+            }
+
              if (s->discard_passthrough[type]) {
                  update_refcount_discard(bs, cluster_offset, s->cluster_size);
              }

I don’t think I like using s->free_cluster_index as a protection against 
allocating something before it.

Hmm, I just propose not to update it, if refcount reached 0 but we still have 
inflight writes.



First, it comes back the problem I just described in my mail from 15:58 GMT+1, 
which is that you’re changing the definition of what a free cluster is.  With 
this proposal, you’re proposing yet a new definition: A free cluster is 
anything with refcount == 0 after free_cluster_index.

I think that free cluster is anything with refcount = 0 and inflight-write-cnt 
= 0.

Then, as I said in my other mail, update_refcount() just cannot free any 
cluster.  So changes to that function can’t be justified by preventing it from 
freeing clusters.

You need to clearly define what it is that update_refcount() should or 
shouldn’t do, and then we have to think about whether when all writes have 
settled, we really have to invoke qcow2_update_cluster_refcount() or whether we 
should do the small outstanding changes just directly in 
update_inflight_write_cnt().

I think this needs to be more formalized, or it doesn’t make sense.

For example, say we do define a free cluster to be refcount (RC) = 0 and 
inflight-write-cnt (IFWC) = 0.  Then everything that is done to a cluster because 
it is considered being freed right now because its RC drops to 0 must probably be 
changed to only be done if also its IFWC is 0.  For example, we should only 
discard host clusters on the protocol layer if a cluster becomes free.  
update_refcount() will no longer be able to free clusters with IFWC > 0, so it 
must never issue a protocol-level discard for them.  And, yes, it also shouldn’t 
adjust first_free_cluster_index, as you propose here.  (But you didn’t explain 
why, and it seems like it was just intuition to you instead of looking at it more 
formally.)

Instead, for clusters with RC = 0 and IFWC > 0, update_inflight_write_cnt() 
will take on the role of freeing them.  So now that function must adjust 
first_free_cluster_index and issue the protocol-level discard for such clusters.

Yes, agree.


I suppose in practice we could invoke qcow2_update_cluster_refcount() with -0, 
as you do, because now the cluster has RC = 0 and IFWC = 0, so now that 
function will be capable of freeing it.  But to me, that just looks like a bit 
of abuse.

agree



I suppose we could create a new function qcow2_cluster_freed() where we collect 
everything that needs to be done once a cluster is considered freed (which so 
far was whenever its RC dropped to 0, which only happens in update_refcount(); 
and then will be whenever its RC and its IFWC drop to 0, which can happen in 
either update_refcount() or update_inflight_write_cnt()).  What would belong in 
there is discarding the cluster on the protocol level, and adjusting 
first_free_cluster_index.  (Perhaps more, I don’t know.)  With such a function, 
it would seem clear to me that there is no need to invoke 
qcow2_update_cluster_refcount() just to get precisely that effect.

yes



(The alternative would be to keep RC == 0 the definition of a freed cluster.  Then 
we’d have to postpone the s->set_refcount() in update_refcount(), and update 
the refcount again in update_inflight_write_cnt(), but invoking 
qcow2_update_cluster_refcount().  We wouldn’t need to change the allocation 
functions.

I’m not saying that alternative is better – I don’t think it is, I think you’re 
right that the definition of a freed cluster should be changed. I’m just 
presenting it in contrast, to show when it would make sense to call 
qcow2_update_cluster_refcount().)

OK

In the meanwhile Kevin dispelled my "big problems" in "[PATCH v2(RFC) 0/3] qcow2: 
fix parallel rewrite and discard", so probably next step would be to retry CoRwLock-based 
approach.


And free_cluster_index is a hint where start to search for such cluster.


Now looking only at the allocation functions, it may look like that kind of is 
the definition already.  But I don’t think that was the intention when 
free_cluster_index was introduced, so we’d have to check every place that sets 
free_cluster_index, to see whether it adheres to this definition.

And I think it’s clear that there is a place that won’t adhere to this 
definition, and that is this very place here, in update_refcount(). Say 
free_cluster_index is 42.  Then you free cluster 39, but there is a write to 
it, so free_cluster_index isn’t update.  Then you free cluster 38, and there 
are writes to that cluster, so free_cluster_index is updated to 38.  Suddenly, 
39 is free to be allocated, too.

Why? 39 is protected by inflight-cnt, and we do has_infl_wr() check together 
with refcount==0 check when allocate clusters.

I was (wrongly) assuming that with this change you’d drop the check in the 
allocation functions.

Max

(The precise problem is that with this new definition decreasing 
free_cluster_index suddenly has the power to free any cluster between its new 
and all value.  With the old definition, changing free_cluster_index would 
never free any cluster.  So when you decrease free_cluster_index, you suddenly 
have to be sure that all clusters between the new and old value that have 
refcount 0 are indeed to be considered free.)

Max






--
Best regards,
Vladimir



reply via email to

[Prev in Thread] Current Thread [Next in Thread]