[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: qcow2 api not secured by mutex lock
From: |
Vladimir Sementsov-Ogievskiy |
Subject: |
Re: qcow2 api not secured by mutex lock |
Date: |
Thu, 19 Dec 2019 10:25:30 +0000 |
19.12.2019 13:02, Kevin Wolf wrote:
> Am 18.12.2019 um 11:28 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> Hi!
>>
>> Some time ago, we've faced and fixed the fact that qcow2 bitmap api doesn't
>> call qcow2_co_mutex_lock, before accessing qcow2 metadata. This was solved by
>> moving qcow2_co_remove_persistent_dirty_bitmap and
>> qcow2_co_can_store_new_dirty_bitmap to coroutine and call
>> qcow2_co_mutex_lock.
>>
>> Now I decided to look at big picture (it is attached).
>>
>> Boxes are qcow2 driver api, green border means that function calls
>> qcow2_co_mutex_lock
>> (it doesn't guarantee, that exactly child node call is locked, but it is
>> something).
>>
>> In the picture there are just all functions, calling qcow2_cache_get/put..
>> Not all the
>> functions, that needs locking, but again, it is something.
>>
>> So, accordingly to the picture, it seems that the following functions lacks
>> locking:
>>
>> qcow2_co_create
>
> This should be easy to fix. It's also relatively harmless because it's
> unlikely that the image that is being created is accessed by someone
> else (the user would have to query the auto-generated node name and
> start something on it - at which point they deserve what they get).
>
>> qcow2_snapshot_*
>> (but it is both drained and aio context locked, so should be safe, yes?)
>
> If you checked that these conditions are true, it should be safe.
>
>> qcow2_reopen_bitmaps_rw
>> qcow2_store_persistent_dirty_bitmaps
>
> Reopen drains the image, so I think this is safe in practice.
>
> If we want to do something about it anyway (e.g. move it to a coroutine
> so it can take a lock) the question is where to do that. Maybe even for
> .bdrv_reopen_* in general?
>
>> qcow2_amend_options
>
> Only qemu-img so far, so no concurrency. We're about to add
> blockdev-amend in QMP, though, so this looks like something that should
> take the lock.
>
> In fact, is taking the lock enough or should it actually drain the node,
> too?
>
>> qcow2_make_empty
>
> This one should certainly drain. It is used not only in qemu-img, but
> also in HMP commit and apparently also in replication.
>
> This one might be a bug that could become visible in practice. Unlikely
> for HMP commit (because it takes a while and is holding the BQL, so no
> new guest requests will be processed), except maybe for cases where
> there is nothing to commit.
>
>> ===
>>
>> Checking green nodes:
>>
>> qcow2_co_invalidate_cache actually calls qcow2_close unlocked, it's
>> another reason to fix qcow2_store_persistent_dirty_bitmaps
>
> Might be. Do we want a .bdrv_co_close?
>
>> qcow2_write_snapshots actually called unlocked from
>> qcow2_check_fix_snapshot_table.. It seems unsafe.
>
> This is curious, I'm not sure why you would drop the lock there. Max?
>
> bdrv_flush() calls would have to replaced with qcow2_write_caches() to
> avoid a deadlock, but otherwise I don't see why we would want to drop
> the lock.
>
> Of course, this should only be called from qemu-img check, so in
> practice it's probably not a bug.
>
Thanks for analysis! I'll continue thinking on this and come with patches
(or new questions).
--
Best regards,
Vladimir