[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 1/1] Fix qcow2 corruption on discard
From: |
Maxim Levitsky |
Subject: |
Re: [PATCH 1/1] Fix qcow2 corruption on discard |
Date: |
Mon, 23 Nov 2020 20:11:51 +0200 |
User-agent: |
Evolution 3.36.3 (3.36.3-1.fc32) |
On Mon, 2020-11-23 at 18:38 +0100, Kevin Wolf wrote:
> Am 23.11.2020 um 16:49 hat Maxim Levitsky geschrieben:
> > Commit 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()")
> > introduced a subtle change to code in zero_in_l2_slice:
> >
> > It swapped the order of
> >
> > 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
> > 2. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
> > 3. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
> >
> > To
> >
> > 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
> > 2. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
> > 3. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
> >
> > It seems harmless, however the call to qcow2_free_any_clusters
> > can trigger a cache flush which can mark the L2 table as clean,
> > and assuming that this was the last write to it,
> > a stale version of it will remain on the disk.
>
> Do you have more details on this last paragraph? I'm trying to come up
> with a reproducer, but I don't see how qcow2_free_any_clusters() could
> flush the L2 table cache. (It's easy to get it to flush the refcount
> block cache, but that's useless for a reproducer.)
>
> The only way I see to flush any cache with it is in update_refcount()
> the qcow2_cache_set_dependency() call. This will always flush the cache
> that the L2 cache depends on - which will never be the L2 cache itself,
> but always either the refcount cache or nothing.
>
> There are more options in alloc_refcount_block() if we're allocating a
> new refcount block, but in the context of freeing clusters we'll never
> need to do that.
>
> Whatever I tried, at the end of zero_in_l2_slice(), I have a dirty L2
> table and a dirty refcount block in the cache, with a dependency that
> makes sure that the L2 table will be written out first.
>
> If you don't have the information yet, can you try to debug your manual
> reproducer a bit more to find out how this happens?
I'll do this tomorrow.
Best regards,
Maxim Levitsky
>
> Kevin
>
> > Now we have a valid L2 entry pointing to a freed cluster. Oops.
> >
> > Fixes: 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()")
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> > block/qcow2-cluster.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> > index 485b4cb92e..267b46a4ca 100644
> > --- a/block/qcow2-cluster.c
> > +++ b/block/qcow2-cluster.c
> > @@ -2010,11 +2010,11 @@ static int zero_in_l2_slice(BlockDriverState *bs,
> > uint64_t offset,
> > continue;
> > }
> >
> > - qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
> > if (unmap) {
> > qcow2_free_any_cluster(bs, old_l2_entry,
> > QCOW2_DISCARD_REQUEST);
> > }
> > set_l2_entry(s, l2_slice, l2_index + i, new_l2_entry);
> > + qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
> > if (has_subclusters(s)) {
> > set_l2_bitmap(s, l2_slice, l2_index + i, new_l2_bitmap);
> > }
> > --
> > 2.26.2
> >
- [PATCH 0/1] Fix qcow2 corruption after addition of subcluster support, Maxim Levitsky, 2020/11/23
- [PATCH 1/1] Fix qcow2 corruption on discard, Maxim Levitsky, 2020/11/23
- Re: [PATCH 1/1] Fix qcow2 corruption on discard, Kevin Wolf, 2020/11/23
- Re: [PATCH 1/1] Fix qcow2 corruption on discard,
Maxim Levitsky <=
- Re: [PATCH 1/1] Fix qcow2 corruption on discard, Kevin Wolf, 2020/11/24
- Re: [PATCH 1/1] Fix qcow2 corruption on discard, Alberto Garcia, 2020/11/24
- Re: [PATCH 1/1] Fix qcow2 corruption on discard, Maxim Levitsky, 2020/11/24
- Re: [PATCH 1/1] Fix qcow2 corruption on discard, Maxim Levitsky, 2020/11/24
- Re: [PATCH 1/1] Fix qcow2 corruption on discard, Alberto Garcia, 2020/11/25