qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: qcow2 api not secured by mutex lock


From: Kevin Wolf
Subject: Re: qcow2 api not secured by mutex lock
Date: Thu, 19 Dec 2019 11:02:30 +0100
User-agent: Mutt/1.12.1 (2019-06-15)

Am 18.12.2019 um 11:28 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Hi!
> 
> Some time ago, we've faced and fixed the fact that qcow2 bitmap api doesn't
> call qcow2_co_mutex_lock, before accessing qcow2 metadata. This was solved by
> moving qcow2_co_remove_persistent_dirty_bitmap and
> qcow2_co_can_store_new_dirty_bitmap to coroutine and call qcow2_co_mutex_lock.
> 
> Now I decided to look at big picture (it is attached).
> 
> Boxes are qcow2 driver api, green border means that function calls 
> qcow2_co_mutex_lock
> (it doesn't guarantee, that exactly child node call is locked, but it is 
> something).
> 
> In the picture there are just all functions, calling qcow2_cache_get/put.. 
> Not all the
> functions, that needs locking, but again, it is something.
> 
> So, accordingly to the picture, it seems that the following functions lacks 
> locking:
> 
> qcow2_co_create

This should be easy to fix. It's also relatively harmless because it's
unlikely that the image that is being created is accessed by someone
else (the user would have to query the auto-generated node name and
start something on it - at which point they deserve what they get).

> qcow2_snapshot_*
>    (but it is both drained and aio context locked, so should be safe, yes?)

If you checked that these conditions are true, it should be safe.

> qcow2_reopen_bitmaps_rw
> qcow2_store_persistent_dirty_bitmaps

Reopen drains the image, so I think this is safe in practice.

If we want to do something about it anyway (e.g. move it to a coroutine
so it can take a lock) the question is where to do that. Maybe even for
.bdrv_reopen_* in general?

> qcow2_amend_options

Only qemu-img so far, so no concurrency. We're about to add
blockdev-amend in QMP, though, so this looks like something that should
take the lock.

In fact, is taking the lock enough or should it actually drain the node,
too?

> qcow2_make_empty

This one should certainly drain. It is used not only in qemu-img, but
also in HMP commit and apparently also in replication.

This one might be a bug that could become visible in practice. Unlikely
for HMP commit (because it takes a while and is holding the BQL, so no
new guest requests will be processed), except maybe for cases where
there is nothing to commit.

> ===
> 
> Checking green nodes:
> 
> qcow2_co_invalidate_cache actually calls qcow2_close unlocked, it's
> another reason to fix qcow2_store_persistent_dirty_bitmaps

Might be. Do we want a .bdrv_co_close?

> qcow2_write_snapshots actually called unlocked from
> qcow2_check_fix_snapshot_table.. It seems unsafe.

This is curious, I'm not sure why you would drop the lock there. Max?

bdrv_flush() calls would have to replaced with qcow2_write_caches() to
avoid a deadlock, but otherwise I don't see why we would want to drop
the lock.

Of course, this should only be called from qemu-img check, so in
practice it's probably not a bug.

Kevin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]