[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 3/3] qcow2: Fix corruption bug in qcow2_detect_metadata_preal
From: |
Vladimir Sementsov-Ogievskiy |
Subject: |
Re: [PATCH 3/3] qcow2: Fix corruption bug in qcow2_detect_metadata_preallocation() |
Date: |
Thu, 24 Oct 2019 12:41:50 +0000 |
24.10.2019 14:17, Kevin Wolf wrote:
> Am 24.10.2019 um 12:46 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 23.10.2019 18:26, Kevin Wolf wrote:
>>> qcow2_detect_metadata_preallocation() calls qcow2_get_refcount() which
>>> requires s->lock to be taken to protect its accesses to the refcount
>>> table and refcount blocks. However, nothing in this code path actually
>>> took the lock. This could cause the same cache entry to be used by two
>>> requests at the same time, for different tables at different offsets,
>>> resulting in image corruption.
>>>
>>> As it would be preferable to base the detection on consistent data (even
>>> though it's just heuristics), let's take the lock not only around the
>>> qcow2_get_refcount() calls, but around the whole function.
>>>
>>> This patch takes the lock in qcow2_co_block_status() earlier and asserts
>>> in qcow2_detect_metadata_preallocation() that we hold the lock.
>>>
>>> Fixes: 69f47505ee66afaa513305de0c1895a224e52c45
>>> Cc: address@hidden
>>> Reported-by: Michael Weiser <address@hidden>
>>> Signed-off-by: Kevin Wolf <address@hidden>
>>
>> Oh, I'm very sorry. I was going to backport this patch, and found that it's
>> fixed in our downstream long ago, even before last upstream version patch
>> sent :(
>
> Seriously? Is your downstream QEMU so divergent from upstream that you
> wouldn't notice something like this? I think you really have to improve
> something there.
How to improve it? Months and years are passed to bring patches into upstream,
of course we have to live with a bunch (now it's nearer to 300 and this is a lot
better then 500+ in the recent past) of patches above Rhel release.
>
> I mean, we have a serious data corruptor in the 4.1 release and I wasted
> days to debug this, and you've been sitting on a fix for months without
> telling anyone?
Of course, it was not my goal to hide the fix, I'll do my best to avoid this
in future. Very sorry for your time wasted.
> This isn't a memory leak or something, this kills
> people's images. It's eating their data.
Possibly, I didn't know about data corruption. For some reason I decided that
lock should be taken here, I don't remember. Still I should have
applied it to upstream version too, of course.
>
> Modifying an image format driver requires utmost care and I think I'll
> try to make sure to allow only few qcow2 changes per release in the
> future. Too many changes were made in too short time, and with too
> little care, and there are even more qcow2 patches pending. Please check
> whether these series actually match your downstream code.
The difference is that I didn't move the lock() but add separate lock()/unlock()
pair around qcow2_detect_metadata_preallocation(). I think there is no serious
difference.
> Anyway, we'll
> tread very carefully now with the pending patches, even if it means
> delaying them for another release or two.
What do you mean? What to do with sent patches during several months?
> Stability is way more
> important for an image format driver than new features and
> optimisations.
Agree. But I think, the main source of stability is testing.
>
> Do whatever you need to fix your downstream process, but seriously, this
> must not ever happen again.
>
I don't see how to improve the process to exclude such problems. But I'll
of course will do my best to avoid them.
--
Best regards,
Vladimir
- Re: [PATCH 2/3] qcow2: Assert that qcow2_cache_get() callers hold s->lock, (continued)
[PATCH 3/3] qcow2: Fix corruption bug in qcow2_detect_metadata_preallocation(), Kevin Wolf, 2019/10/23