[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] Hang with migration multi-thread compression under high
From: |
Li, Liang Z |
Subject: |
Re: [Qemu-devel] Hang with migration multi-thread compression under high load |
Date: |
Thu, 28 Apr 2016 02:41:58 +0000 |
> On Wed, Apr 27, 2016 at 03:29:30PM +0100, Dr. David Alan Gilbert wrote:
> > ccing in Liang Li
> >
> > * Daniel P. Berrange (address@hidden) wrote:
> > > for some reason it isn't shown in the stack thrace for thread
> > > 1 above, when initially connecting GDB it says the main thread is
> > > at:
> > >
> > > decompress_data_with_multi_threads (len=702, host=0x7fd78fe06000,
> f=0x55901af09950) at /home/berrange/src/virt/qemu/migration/ram.c:2254
> > > 2254 for (idx = 0; idx < thread_count; idx++) {
> > >
> > >
> > > Looking at the target QEMU, we see do_data_decompress method is
> > > waiting in a condition var:
> > >
> > > while (!param->start && !quit_decomp_thread) {
> > > qemu_cond_wait(¶m->cond, ¶m->mutex);
> > > ....do stuff..
> > > param->start = false
> > > }
> > >
> > >
> > > Now the decompress_data_with_multi_threads is checking param->start
> > > without holding the param->mutex lock.
> > >
> > > Changing decompress_data_with_multi_threads to acquire param-
> >mutex
> > > lock makes it work, but isn't ideal, since that now blocks the
> > > decompress_data_with_multi_threads() method on the completion of
> > > each thread, which defeats the point of having multiple threads.
>
> FWIW, the following patch also appears to "fix" the issue, presumably by just
> making the race much less likely to hit:
>
> diff --git a/migration/ram.c b/migration/ram.c index 3f05738..be0233f 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2271,6 +2271,7 @@ static void
> decompress_data_with_multi_threads(QEMUFile *f,
> if (idx < thread_count) {
> break;
> }
> + sched_yield();
> }
> }
>
>
> Incidentally IIUC, this decompress_data_with_multi_threads is just busy
> waiting for a thread to become free, which seems pretty wasteful of CPU
> resources. I wonder if there's a more effective way to structure this, so that
> instead of having decompress_data_with_multi_threads()
> choose which thread to pass the decompression job to, it just puts the job
> into a queue, and then let all the threads pull from that shared queue. IOW
> whichever thread the kerenl decides to wakeup would get the job, without
> us having to explicitly assign a thread to the job.
>
>
Thanks for reporting this issue, I will take a look and get back to you.
Liang
> Regards,
> Daniel
> --
> |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org -o- http://virt-manager.org :|
> |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|