[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 1/1] Stop reinit of XBZRLE.lock
From: |
Markus Armbruster |
Subject: |
Re: [Qemu-devel] [PATCH 1/1] Stop reinit of XBZRLE.lock |
Date: |
Tue, 18 Mar 2014 18:24:11 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.2 (gnu/linux) |
"Dr. David Alan Gilbert (git)" <address@hidden> writes:
> From: "Dr. David Alan Gilbert" <address@hidden>
>
> Markus Armbruster spotted that the XBZRLE.lock might get initalised
> multiple times in the case of a second attempted migration, and
> that's undefined behaviour for pthread_mutex_init.
>
> This patch adds a flag to stop re-initialisation - finding somewhere
> to cleanly destroy it where we can guarantee isn't happening
> at the same place we might want to take the lock is tricky.
>
> It also adds comments to explain the lock use more.
>
> (I considered rewriting this lockless but can't justify it given
> the simplicity of the fix).
>
> Signed-off-by: Dr. David Alan Gilbert <address@hidden>
> ---
> arch_init.c | 30 ++++++++++++++++++++++++++----
> 1 file changed, 26 insertions(+), 4 deletions(-)
>
> diff --git a/arch_init.c b/arch_init.c
> index 60c975d..16474b5 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -167,10 +167,13 @@ static struct {
> /* Cache for XBZRLE, Protected by lock. */
> PageCache *cache;
> QemuMutex lock;
> + bool lock_init; /* True once we've init'd lock */
> } XBZRLE = {
> .encoded_buf = NULL,
> .current_buf = NULL,
> .cache = NULL,
> + /* .lock is initialised in ram_save_setup */
> + .lock_init = false
> };
Redundant initializers.
> /* buffer used for XBZRLE decoding */
> static uint8_t *xbzrle_decoded_buf;
> @@ -187,6 +190,11 @@ static void XBZRLE_cache_unlock(void)
> qemu_mutex_unlock(&XBZRLE.lock);
> }
>
> +/* called from qmp_migrate_set_cache_size in main thread, possibly while
> + * a migration is in progress.
> + * A running migration maybe using the cache and might finish during this
may be
> + * call, hence changes to the cache are protected by XBZRLE.lock().
> + */
Style nit, since I'm nitpicking spelling already: our winged comments
usually look like
/*
* Text
*/
> int64_t xbzrle_cache_resize(int64_t new_size)
> {
> PageCache *new_cache, *cache_to_free;
> @@ -195,9 +203,12 @@ int64_t xbzrle_cache_resize(int64_t new_size)
> return -1;
> }
>
> - /* no need to lock, the current thread holds qemu big lock */
> + /* The current thread holds qemu big lock, and we hold it while creating
> + * the cache in ram_save_setup, thus it's safe to test if the
> + * cache exists yet without it's own lock (which might not have been
> + * init'd yet)
> + */
> if (XBZRLE.cache != NULL) {
> - /* check XBZRLE.cache again later */
> if (pow2floor(new_size) == migrate_xbzrle_cache_size()) {
> return pow2floor(new_size);
> }
> @@ -209,7 +220,10 @@ int64_t xbzrle_cache_resize(int64_t new_size)
> }
>
> XBZRLE_cache_lock();
> - /* the XBZRLE.cache may have be destroyed, check it again */
> + /* the migration might have finished between the check above and us
> + * taking the lock, causing XBZRLE.cache to be destroyed
> + * check it again
> + */
> if (XBZRLE.cache != NULL) {
> cache_to_free = XBZRLE.cache;
> XBZRLE.cache = new_cache;
> @@ -744,7 +758,15 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
> DPRINTF("Error creating cache\n");
> return -1;
> }
> - qemu_mutex_init(&XBZRLE.lock);
> + /* mutex's can't be reinit'd without destroying them
> + * and we've not got a good place to destroy it that
> + * we can guarantee isn't being called when we might want
> + * to hold the lock.
> + */
> + if (!XBZRLE.lock_init) {
> + XBZRLE.lock_init = true;
> + qemu_mutex_init(&XBZRLE.lock);
> + }
> qemu_mutex_unlock_iothread();
>
> /* We prefer not to abort if there is no memory */
I detest how the locking works in xbzrle_cache_resize().
The first XBZRLE.cache != NULL is not under XBZRLE.lock. As you explain
in the comment, this is required, because we foolishly delay lock
initialization until a migration starts, and it's safe, because cache
creation is also under the BQL.
The second XBZRLE.cache != NULL *is* under XBZRLE.lock. Required,
because cache destruction is *not* inder the BQL, only under
XBZRLE.lock.
I'd very, very much prefer this to be made obviously safe: initialize
XBZRLE.lock sufficiently early, then access XBZRLE.cache only under
XBZRLE.lock.
Confusing and way too subtle for no good reason. But your patch doesn't
add subtlety, it explains it, and fixes a bug. Therefore:
Reviewed-by: Markus Armbruster <address@hidden>