[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v3] XBZRLE: Fix qemu crash when resize the xbzrl
From: |
Juan Quintela |
Subject: |
Re: [Qemu-devel] [PATCH v3] XBZRLE: Fix qemu crash when resize the xbzrle cache |
Date: |
Tue, 25 Feb 2014 16:24:54 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) |
"Gonglei (Arei)" <address@hidden> wrote:
> Resizing the xbzrle cache during migration causes qemu-crash,
> because the main-thread and migration-thread modify the xbzrle
> cache size concurrently without lock-protection.
>
> Signed-off-by: ChenLiang <address@hidden>
> Signed-off-by: Gonglei <address@hidden>
> Reviewed-by: Dr. David Alan Gilbert <address@hidden>
Sorry for the late review.
> ---
> Changes against the previous version:
> *Remove the temporary variable cache in the xbzrle_cache_resize.
>
> arch_init.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 55 insertions(+), 3 deletions(-)
>
> diff --git a/arch_init.c b/arch_init.c
> index 80574a0..003640f 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -164,26 +164,69 @@ static struct {
> uint8_t *encoded_buf;
> /* buffer for storing page content */
> uint8_t *current_buf;
> - /* Cache for XBZRLE */
> + /* Cache for XBZRLE, Protected by lock. */
> PageCache *cache;
> + QemuMutex lock;
> } XBZRLE = {
> .encoded_buf = NULL,
> .current_buf = NULL,
> .cache = NULL,
> + /* use the lock carefully */
> + .lock = {PTHREAD_MUTEX_INITIALIZER},
> };
Have you tried to compile for windows? I think this would make it
break. We need to init it with qemu_mutex_init() in ram_save_setup()
after the call to cache_init()?
> /* buffer used for XBZRLE decoding */
> static uint8_t *xbzrle_decoded_buf;
>
> +static void XBZRLE_cache_lock(void)
> +{
> + qemu_mutex_lock(&XBZRLE.lock);
> +}
> +
if we change this to:
if (migrate_use_xbzrle())
qemu_mutex_lock(&XBZRLE.lock);
On both cache operations, we would remove the overhead in the no XBRLE
case, right?
> +static void XBZRLE_cache_unlock(void)
> +{
> + qemu_mutex_unlock(&XBZRLE.lock);
> +}
> +
> int64_t xbzrle_cache_resize(int64_t new_size)
> {
> + PageCache *new_cache, *old_cache;
> +
> if (new_size < TARGET_PAGE_SIZE) {
> return -1;
> }
>
> + XBZRLE_cache_lock();
> if (XBZRLE.cache != NULL) {
> - return cache_resize(XBZRLE.cache, new_size / TARGET_PAGE_SIZE) *
> - TARGET_PAGE_SIZE;
> + /* check XBZRLE.cache again later */
> + XBZRLE_cache_unlock();
> + if (pow2floor(new_size) == migrate_xbzrle_cache_size()) {
> + return pow2floor(new_size);
> + }
> + new_cache = cache_init(new_size / TARGET_PAGE_SIZE,
> + TARGET_PAGE_SIZE);
> + if (!new_cache) {
> + DPRINTF("Error creating cache\n");
> + return -1;
> + }
> +
> + XBZRLE_cache_lock();
> + /* the XBZRLE.cache may have be destroyed, check it again */
> + if (XBZRLE.cache != NULL) {
> + old_cache = XBZRLE.cache;
> + XBZRLE.cache = new_cache;
> + new_cache = NULL;
> + }
> + XBZRLE_cache_unlock();
> +
> + if (NULL == new_cache) {
> + cache_fini(old_cache);
> + } else {
> + cache_fini(new_cache);
> + }
> + } else {
> + XBZRLE_cache_unlock();
> }
Can we change this to:
if (pow2floor(new_size) == migrate_xbzrle_cache_size()) {
return pow2floor(new_size);
}
new_cache = cache_init(new_size / TARGET_PAGE_SIZE,
TARGET_PAGE_SIZE);
if (!new_cache) {
DPRINTF("Error creating cache\n");
return -1;
}
XBZRLE_cache_lock();
/* the XBZRLE.cache may have be destroyed, check it again */
if (XBZRLE.cache != NULL) {
cache_to_free = XBZRLE.cache;
XBZRLE.cache = new_cache;
} else {
cache_to_free = new_cache;
}
XBZRLE_cache_unlock();
cache_fini(cache_to_free);
I think that looking is clearer this way. BTW, we are losing
_everything_ that is on the cache if we resize it. I don't know if that
is what we want. If we have a big cache, and make it bigger because we
are having too many misses, just dropping the whole cache looks like a
bad idea :-(
And my understanding is that we should do a:
migrate_get_current()->xbzrle_cache_size = new_size;
independently of what CACHE is equal or different from NULL?
(this was already wrong before your patch, just asking because you are
looking at the code).
Thanks, Juan.