[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH] migration: Fix multi-thread compression bug
From: |
Juan Quintela |
Subject: |
Re: [Qemu-devel] [PATCH] migration: Fix multi-thread compression bug |
Date: |
Wed, 04 May 2016 11:11:55 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) |
Liang Li <address@hidden> wrote:
> Recently, a bug related to multiple thread compression feature for
> live migration is reported. The destination side will be blocked
> during live migration if there are heavy workload in host and
> memory intensive workload in guest, this is most likely to happen
> when there is one decompression thread.
>
> Some parts of the decompression code are incorrect:
> 1. The main thread receives data from source side will enter a busy
> loop to wait for a free decompression thread.
> 2. A lock is needed to protect the decomp_param[idx]->start, because
> it is checked in the main thread and is updated in the decompression
> thread.
>
> Fix these two issues by following the code pattern for compression.
>
> Reported-by: Daniel P. Berrange <address@hidden>
> Signed-off-by: Liang Li <address@hidden>
step in the right direction, so:
Reviewed-by: Juan Quintela <address@hidden>
but I am still not sure that this is
enough. if you have the change, look at the multiple-fd code that I
posted, is very similar here.
> struct DecompressParam {
what protect start, and what protect done?
> bool start;
> + bool done;
> QemuMutex mutex;
> QemuCond cond;
> void *des;
> @@ -287,6 +288,8 @@ static bool quit_comp_thread;
> static bool quit_decomp_thread;
> static DecompressParam *decomp_param;
> static QemuThread *decompress_threads;
> +static QemuMutex decomp_done_lock;
> +static QemuCond decomp_done_cond;
>
> static int do_compress_ram_page(CompressParam *param);
>
> @@ -834,6 +837,7 @@ static inline void start_compression(CompressParam *param)
>
> static inline void start_decompression(DecompressParam *param)
> {
Here nothing protects done
> + param->done = false;
> qemu_mutex_lock(¶m->mutex);
> param->start = true;
> qemu_cond_signal(¶m->cond);
> @@ -2193,19 +2197,24 @@ static void *do_data_decompress(void *opaque)
> qemu_mutex_lock(¶m->mutex);
we are looking at quit_decomp_thread and nothing protects it
> while (!param->start && !quit_decomp_thread) {
> qemu_cond_wait(¶m->cond, ¶m->mutex);
> + }
> + if (!quit_decomp_thread) {
> pagesize = TARGET_PAGE_SIZE;
> - if (!quit_decomp_thread) {
> - /* uncompress() will return failed in some case, especially
> - * when the page is dirted when doing the compression, it's
> - * not a problem because the dirty page will be retransferred
> - * and uncompress() won't break the data in other pages.
> - */
> - uncompress((Bytef *)param->des, &pagesize,
> - (const Bytef *)param->compbuf, param->len);
> - }
> - param->start = false;
> + /* uncompress() will return failed in some case, especially
> + * when the page is dirted when doing the compression, it's
> + * not a problem because the dirty page will be retransferred
> + * and uncompress() won't break the data in other pages.
> + */
> + uncompress((Bytef *)param->des, &pagesize,
> + (const Bytef *)param->compbuf, param->len);
We are calling uncompress (a slow operation) with param->mutex taken, is
there any reason why we can't just put the param->* vars in locals?
> }
> + param->start = false;
Why are we setting start to false when we _are_ not decompressing a
page? I think this line should be inside the loop.
> qemu_mutex_unlock(¶m->mutex);
> +
> + qemu_mutex_lock(&decomp_done_lock);
> + param->done = true;
here param->done is protected by decomp_done_lock.
> + qemu_cond_signal(&decomp_done_cond);
> + qemu_mutex_unlock(&decomp_done_lock);
> }
>
> return NULL;
> @@ -2219,10 +2228,13 @@ void migrate_decompress_threads_create(void)
> decompress_threads = g_new0(QemuThread, thread_count);
> decomp_param = g_new0(DecompressParam, thread_count);
> quit_decomp_thread = false;
> + qemu_mutex_init(&decomp_done_lock);
> + qemu_cond_init(&decomp_done_cond);
> for (i = 0; i < thread_count; i++) {
> qemu_mutex_init(&decomp_param[i].mutex);
> qemu_cond_init(&decomp_param[i].cond);
> decomp_param[i].compbuf = g_malloc0(compressBound(TARGET_PAGE_SIZE));
> + decomp_param[i].done = true;
> qemu_thread_create(decompress_threads + i, "decompress",
> do_data_decompress, decomp_param + i,
> QEMU_THREAD_JOINABLE);
> @@ -2258,9 +2270,10 @@ static void
> decompress_data_with_multi_threads(QEMUFile *f,
> int idx, thread_count;
>
> thread_count = migrate_decompress_threads();
> + qemu_mutex_lock(&decomp_done_lock);
we took decomp_done_lock
> while (true) {
> for (idx = 0; idx < thread_count; idx++) {
> - if (!decomp_param[idx].start) {
> + if (decomp_param[idx].done) {
and we can protecet done with it.
> qemu_get_buffer(f, decomp_param[idx].compbuf, len);
> decomp_param[idx].des = host;
> decomp_param[idx].len = len;
but this ones should be proteced by docomp_param[idx].mutex, no?
> @@ -2270,8 +2283,11 @@ static void
> decompress_data_with_multi_threads(QEMUFile *f,
> }
> if (idx < thread_count) {
> break;
> + } else {
> + qemu_cond_wait(&decomp_done_cond, &decomp_done_lock);
> }
> }
> + qemu_mutex_unlock(&decomp_done_lock);
> }
>
> /*
Thanks, Juan.