qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Hang with migration multi-thread compression under high


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] Hang with migration multi-thread compression under high load
Date: Wed, 27 Apr 2016 15:29:30 +0100
User-agent: Mutt/1.5.24 (2015-08-30)

ccing in Liang Li

* Daniel P. Berrange (address@hidden) wrote:
> I've been testing various features of migration and have hit a problem
> with the multi-thread compression. It works fine when I have 2 or more
> threads, but if I tell it to only use a single thread, then it almost
> always hangs
> 
> I'm doing a migration between 2 guests on the same machine over a
> tcp localhost socket, using this command line to launch them:
> 
>   /home/berrange/src/virt/qemu/x86_64-softmmu/qemu-system-x86_64
>      -chardev socket,id=mon,path=/var/tmp/qemu-src-4644-monitor.sock
>      -mon chardev=mon,mode=control
>      -display none
>      -vga none
>      -machine accel=kvm
>      -kernel /boot/vmlinuz-4.4.7-300.fc23.x86_64
>      -initrd /home/berrange/src/virt/qemu/tests/migration/initrd-stress.img
>      -append "noapic edd=off printk.time=1 noreplace-smp 
> cgroup_disable=memory pci=noearly console=ttyS0 debug ramsize=1"
>      -chardev stdio,id=cdev0
>      -device isa-serial,chardev=cdev0
>      -m 1536
>      -smp 1
> 
> The target VM also gets
> 
>     -incoming tcp:localhost:9000
> 
> 
> When the VM hangs, the source QEMU shows this stack trace:
> 
> 
> Thread 5 (Thread 0x7fdb0aa93700 (LWP 4648)):
> #0  0x00007fdb172d4c59 in syscall () at /lib64/libc.so.6
> #1  0x0000558bf0b9ba15 in qemu_event_wait (val=<optimized out>, ev=<optimized 
> out>) at util/qemu-thread-posix.c:292
> #2  0x0000558bf0b9ba15 in qemu_event_wait (address@hidden 
> <rcu_call_ready_event>) at util/qemu-thread-posix.c:399
> #3  0x0000558bf0ba9e3e in call_rcu_thread (opaque=<optimized out>) at 
> util/rcu.c:250
> #4  0x00007fdb175a060a in start_thread () at /lib64/libpthread.so.0
> #5  0x00007fdb172daa4d in clone () at /lib64/libc.so.6
> 
> Thread 4 (Thread 0x7fdb0a292700 (LWP 4649)):
> #0  0x00007fdb172d0707 in ioctl () at /lib64/libc.so.6
> #1  0x0000558bf089b887 in kvm_vcpu_ioctl (address@hidden, address@hidden) at 
> /home/berrange/src/virt/qemu/kvm-all.c:1984
> #2  0x0000558bf089b9df in kvm_cpu_exec (address@hidden) at 
> /home/berrange/src/virt/qemu/kvm-all.c:1834
> #3  0x0000558bf0889982 in qemu_kvm_cpu_thread_fn (arg=0x558bf268c8c0) at 
> /home/berrange/src/virt/qemu/cpus.c:1069
> #4  0x00007fdb175a060a in start_thread () at /lib64/libpthread.so.0
> #5  0x00007fdb172daa4d in clone () at /lib64/libc.so.6
> 
> Thread 3 (Thread 0x7fdaa3dff700 (LWP 4655)):
> #0  0x00007fdb175a5b10 in pthread_cond_wait@@GLIBC_2.3.2 () at 
> /lib64/libpthread.so.0
> #1  0x0000558bf0b9b709 in qemu_cond_wait (address@hidden, address@hidden) at 
> util/qemu-thread-posix.c:123
> #2  0x0000558bf08ad40b in do_data_compress (opaque=0x558bf42e2870) at 
> /home/berrange/src/virt/qemu/migration/ram.c:306
> #3  0x00007fdb175a060a in start_thread () at /lib64/libpthread.so.0
> #4  0x00007fdb172daa4d in clone () at /lib64/libc.so.6
> 
> Thread 2 (Thread 0x7fdaa35fe700 (LWP 4656)):
> #0  0x00007fdb175a909d in sendmsg () at /lib64/libpthread.so.0
> #1  0x0000558bf0b9ed35 in do_send_recv (address@hidden, address@hidden, 
> iov_cnt=<optimized out>, address@hidden) at util/iov.c:107
> #2  0x0000558bf0b9f2a4 in iov_send_recv (sockfd=15, address@hidden, 
> address@hidden, address@hidden, address@hidden, address@hidden) at 
> util/iov.c:194
> #3  0x0000558bf0aaeacc in socket_writev_buffer (opaque=0x558bf2ebf1e0, 
> iov=0x558bf267f130, iovcnt=1, pos=<optimized out>) at 
> migration/qemu-file-unix.c:48
> #4  0x0000558bf0aad47a in qemu_fflush (address@hidden) at 
> migration/qemu-file.c:123
> #5  0x0000558bf0aad5ed in qemu_put_buffer (f=0x558bf2677100, address@hidden 
> "", address@hidden) at migration/qemu-file.c:337
> #6  0x0000558bf0aae040 in qemu_put_qemu_file (size=4045, buf=0x558bf26ae840 
> "", f=0x558bf2677100) at migration/qemu-file.c:642
> #7  0x0000558bf0aae040 in qemu_put_qemu_file (address@hidden, 
> f_src=0x558bf26ae810) at migration/qemu-file.c:642
> #8  0x0000558bf08adf2d in ram_find_and_save_block 
> (bytes_transferred=0x558bf116b050 <bytes_transferred>, offset=6647840, 
> block=0x558bf26c5460, f=0x558bf2677100)
>     at /home/berrange/src/virt/qemu/migration/ram.c:886
> #9  0x0000558bf08adf2d in ram_find_and_save_block 
> (bytes_transferred=0x558bf116b050 <bytes_transferred>, last_stage=<optimized 
> out>, pss=<synthetic pointer>, f=0x558bf267---Type <return> to continue, or q 
> <return> to quit---
> 7100) at /home/berrange/src/virt/qemu/migration/ram.c:972
> #10 0x0000558bf08adf2d in ram_find_and_save_block 
> (bytes_transferred=0x558bf116b050 <bytes_transferred>, ms=<optimized out>, 
> dirty_ram_abs=<optimized out>, last_stage=false, pss=<synthetic pointer>, 
> f=0x558bf2677100) at /home/berrange/src/virt/qemu/migration/ram.c:1247
> #11 0x0000558bf08adf2d in ram_find_and_save_block 
> (bytes_transferred=0x558bf116b050 <bytes_transferred>, 
> dirty_ram_abs=<optimized out>, last_stage=<optimized out>, pss=<synthetic 
> pointer>, f=<optimized out>, ms=<optimized out>) at 
> /home/berrange/src/virt/qemu/migration/ram.c:1299
> #12 0x0000558bf08adf2d in ram_find_and_save_block (address@hidden, 
> address@hidden, bytes_transferred=0x558bf116b050 <bytes_transferred>)
>     at /home/berrange/src/virt/qemu/migration/ram.c:1359
> #13 0x0000558bf08ae211 in ram_save_iterate (f=0x558bf2677100, 
> opaque=<optimized out>) at /home/berrange/src/virt/qemu/migration/ram.c:1993
> #14 0x0000558bf08b40e7 in qemu_savevm_state_iterate (f=0x558bf2677100, 
> address@hidden) at /home/berrange/src/virt/qemu/migration/savevm.c:970
> #15 0x0000558bf0aa9c74 in migration_thread (opaque=0x558bf10fa140 
> <current_migration>) at migration/migration.c:1694
> #16 0x00007fdb175a060a in start_thread () at /lib64/libpthread.so.0
> #17 0x00007fdb172daa4d in clone () at /lib64/libc.so.6
> 
> Thread 1 (Thread 0x7fdb23bedc80 (LWP 4647)):
> #0  0x00007fdb172cf0a1 in ppoll () at /lib64/libc.so.6
> #1  0x0000558bf0b06769 in qemu_poll_ns (__ss=0x0, __timeout=0x7ffe007c71c0, 
> __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
> #2  0x0000558bf0b06769 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized 
> out>, address@hidden) at qemu-timer.c:325
> #3  0x0000558bf0b061da in main_loop_wait (timeout=1000000000) at 
> main-loop.c:252
> #4  0x0000558bf0b061da in main_loop_wait (nonblocking=<optimized out>) at 
> main-loop.c:506
> #5  0x0000558bf0855a34 in main () at vl.c:1934
> #6  0x0000558bf0855a34 in main (argc=<optimized out>, argv=<optimized out>, 
> envp=<optimized out>) at vl.c:4658
> 
> 
> An strace of the source shows that the 'sendmsg' call in the source
> QEMU is not completing, which says that the socket buffer is full.
> So I think we can discount the source QEMU as being a problem.
> 
> 
> Now the target QEMU shows this stack trace:
> 
> 
> 
> Thread 4 (Thread 0x7f86de523700 (LWP 4652)):
> #0  0x00007f86ead64c59 in syscall () at /lib64/libc.so.6
> #1  0x0000560ecd8a2a15 in qemu_event_wait (val=<optimized out>, ev=<optimized 
> out>) at util/qemu-thread-posix.c:292
> #2  0x0000560ecd8a2a15 in qemu_event_wait (address@hidden 
> <rcu_call_ready_event>) at util/qemu-thread-posix.c:399
> #3  0x0000560ecd8b0e3e in call_rcu_thread (opaque=<optimized out>) at 
> util/rcu.c:250
> #4  0x00007f86eb03060a in start_thread () at /lib64/libpthread.so.0
> #5  0x00007f86ead6aa4d in clone () at /lib64/libc.so.6
> 
> Thread 3 (Thread 0x7f86ddd22700 (LWP 4653)):
> #0  0x00007f86eb035b10 in pthread_cond_wait@@GLIBC_2.3.2 () at 
> /lib64/libpthread.so.0
> #1  0x0000560ecd8a2709 in qemu_cond_wait (cond=<optimized out>, 
> address@hidden <qemu_global_mutex>)
>     at util/qemu-thread-posix.c:123
> #2  0x0000560ecd590943 in qemu_kvm_cpu_thread_fn (cpu=<optimized out>) at 
> /home/berrange/src/virt/qemu/cpus.c:1035
> #3  0x0000560ecd590943 in qemu_kvm_cpu_thread_fn (arg=0x560ece73b8c0) at 
> /home/berrange/src/virt/qemu/cpus.c:1074
> #4  0x00007f86eb03060a in start_thread () at /lib64/libpthread.so.0
> #5  0x00007f86ead6aa4d in clone () at /lib64/libc.so.6
> 
> Thread 2 (Thread 0x7f8677bed700 (LWP 4657)):
> #0  0x00007f86eb035b10 in pthread_cond_wait@@GLIBC_2.3.2 () at 
> /lib64/libpthread.so.0
> #1  0x0000560ecd8a2709 in qemu_cond_wait (address@hidden, address@hidden) at 
> util/qemu-thread-posix.c:123
> #2  0x0000560ecd5b422d in do_data_decompress (opaque=0x560ed0391870) at 
> /home/berrange/src/virt/qemu/migration/ram.c:2195
> #3  0x00007f86eb03060a in start_thread () at /lib64/libpthread.so.0
> #4  0x00007f86ead6aa4d in clone () at /lib64/libc.so.6
> 
> Thread 1 (Thread 0x7f86f767dc80 (LWP 4651)):
> #0  0x0000560ecd5b744f in ram_load (len=711, host=0x7f8677e06000, 
> f=0x560ed03a7950) at /home/berrange/src/virt/qemu/migration/ram.c:2263
> #1  0x0000560ecd5b744f in ram_load (f=0x560ed03a7950, opaque=<optimized out>, 
> version_id=<optimized out>)
>     at /home/berrange/src/virt/qemu/migration/ram.c:2513
> #2  0x0000560ecd5b8b87 in vmstate_load (f=0x560ed03a7950, se=0x560ece731f90, 
> version_id=4)
>     at /home/berrange/src/virt/qemu/migration/savevm.c:643
> #3  0x0000560ecd5b8fc3 in qemu_loadvm_state_main (mis=0x560ece75c330, 
> f=0x560ed03a7950) at /home/berrange/src/virt/qemu/migration/savevm.c:1819
> #4  0x0000560ecd5b8fc3 in qemu_loadvm_state_main (address@hidden, 
> address@hidden)
>     at /home/berrange/src/virt/qemu/migration/savevm.c:1850
> #5  0x0000560ecd5bbd36 in qemu_loadvm_state (address@hidden) at 
> /home/berrange/src/virt/qemu/migration/savevm.c:1911
> #6  0x0000560ecd7b1b2f in process_incoming_migration_co 
> (opaque=0x560ed03a7950) at migration/migration.c:384
> #7  0x0000560ecd8b1eba in coroutine_trampoline (i0=<optimized out>, 
> i1=<optimized out>) at util/coroutine-ucontext.c:78
> #8  0x00007f86eacb0f30 in __start_context () at /lib64/libc.so.6
> #9  0x00007ffc877e49c0 in  ()
> 
> 
> for some reason it isn't shown in the stack thrace for thread
> 1 above, when initially connecting GDB it says the main thread
> is at:
> 
> decompress_data_with_multi_threads (len=702, host=0x7fd78fe06000, 
> f=0x55901af09950) at /home/berrange/src/virt/qemu/migration/ram.c:2254
> 2254          for (idx = 0; idx < thread_count; idx++) {
> 
> 
> Looking at the target QEMU, we see  do_data_decompress method
> is waiting in a condition var:
> 
>         while (!param->start && !quit_decomp_thread) {
>           qemu_cond_wait(&param->cond, &param->mutex);
>             ....do stuff..
>           param->start = false
>         }
> 
> 
> Now the decompress_data_with_multi_threads is checking param->start without
> holding the param->mutex lock.
> 
> Changing decompress_data_with_multi_threads to acquire param->mutex
> lock makes it work, but isn't ideal, since that now blocks the
> decompress_data_with_multi_threads() method on the completion of
> each thread, which defeats the point of having multiple threads.
> 
> 
> As mentioned above I'm only seeing the hang when using 1 decompress
> thread. If it let QEMU have multiple decompress threads everything
> is fine. Also, it only happens if I have a very heavy guest workload.
> If the guest is completely idle, it again works fine. So clearly there
> is some kind of race condition I'm unlucky enough to hit here.
> 
> In terms of monitor commands I'm just running
> 
> 
>   migrate_set_capabilities compress on        (src + dst)
>   migrate_set_parameters compress-threads 1   (src only)
>   migrate_set_parameters decompress-threads 1 (dst only)
> 
> Then migrate -d tcp:localhost:9000
> 
> Regards,
> Daniel
> -- 
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]