[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH 0/1] Fix qcow2 corruption after addition of subcluster support
From: |
Maxim Levitsky |
Subject: |
[PATCH 0/1] Fix qcow2 corruption after addition of subcluster support |
Date: |
Mon, 23 Nov 2020 17:49:28 +0200 |
On this weekend, I had discovered that one of my VMs started to act weird.
Due to this, I found out that it and most of the other VMs I have,
have grown an qcow2 corruption.
So after some bisecting, digging through dumps, and debugging,
I think I found the root cause and a fix.
In addition to that I would like to raise few points:
1. I had to use qcow2-dump from (*)
(it is also on github but without source. wierd...)
to examine the L1/L2 tables and refcount tables.
It seems that there were few attempts (**), (***) to make an official tool that
would dump at least L1/L2/refcount tables, but nothing got accepted
so far.
I think that an official tool to dump at least basic qcow2 structure
would be very helpful to discover/debug qcow2 corruptions.
I had to study again the qcow2 format for this, so I can help with that.
2. 'qemu-img check -r all' is happy to create clusters that are referenced
from multiple L2 entries.
This isn't technically wrong, since write through any of these l2 entries
will COW the cluster.
However I would be happy to know that my images don't have such clusters,
so I would like qemu-img check to at least notify about this.
Can we add some -check-weird-but-legal flag to it to check this?
Few notes about the condition for this corruption to occur:
I have a bunch of VMs which are running each using two qcow2 files,
base and a snapshot on top of it, which I 'qemu-img commit' once in a while.
Discard is enabled to avoid wasting disk space.
Since discard is enabled, 'qemu-img commit' often discards data on the base
disk.
The corruption happens after such a commit, and manifests in a stale L2
entry that was supposed to be discarded but now points to an unused cluster.
I wasn't able to reproduce this on small test case so far.
Best regards,
Maxim Levitsky
(*)https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02760.html
(**)
https://patchwork.kernel.org/project/qemu-devel/patch/20180328133845.20632-1-berto@igalia.com/
(***)
https://patchwork.kernel.org/project/qemu-devel/cover/1578990137-308222-1-git-send-email-andrey.shinkevich@virtuozzo.com/
Maxim Levitsky (1):
Fix qcow2 corruption on discard
block/qcow2-cluster.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--
2.26.2
- [PATCH 0/1] Fix qcow2 corruption after addition of subcluster support,
Maxim Levitsky <=
- [PATCH 1/1] Fix qcow2 corruption on discard, Maxim Levitsky, 2020/11/23
- Re: [PATCH 1/1] Fix qcow2 corruption on discard, Kevin Wolf, 2020/11/23
- Re: [PATCH 1/1] Fix qcow2 corruption on discard, Maxim Levitsky, 2020/11/23
- Re: [PATCH 1/1] Fix qcow2 corruption on discard, Kevin Wolf, 2020/11/24
- Re: [PATCH 1/1] Fix qcow2 corruption on discard, Alberto Garcia, 2020/11/24
- Re: [PATCH 1/1] Fix qcow2 corruption on discard, Maxim Levitsky, 2020/11/24
- Re: [PATCH 1/1] Fix qcow2 corruption on discard, Maxim Levitsky, 2020/11/24