qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] coroutines: block: Co-routine re-entered recursively wh


From: Fam Zheng
Subject: Re: [Qemu-block] coroutines: block: Co-routine re-entered recursively when migrating disk with iothreads
Date: Tue, 24 May 2016 10:12:44 +0800
User-agent: Mutt/1.6.1 (2016-04-27)

On Mon, 05/23 14:54, Jason J. Herne wrote:
> Using libvirt to migrate a guest and one guest disk that is using iothreads
> causes Qemu to crash with the message:
> Co-routine re-entered recursively
> 
> I've looked into this one a bit but I have not seen anything that
> immediately stands out.
> Here is what I have found:
> 
> In qemu_coroutine_enter:
>     if (co->caller) {
>         fprintf(stderr, "Co-routine re-entered recursively\n");
>         abort();
>     }
> 
> The value of co->caller is actually changing between the time "if
> (co->caller)" is evaluated and the time I print some debug statements
> directly under the existing fprintf. I confirmed this by saving the value in
> a local variable and printing both the new local variable and co->caller
> immediately after the existing fprintf. This would certainly indicate some
> kind of concurrency issue. However, it does not necessarily point to the
> reason we ended up inside this if statement because co->caller was not NULL
> before it was trashed. Perhaps it was trashed more than once then? I figured
> maybe the problem was with coroutine pools so I disabled them
> (--disable-coroutine-pool) and still hit the bug.

Which coroutine backend are you using?

> 
> The backtrace is not always identical. Here is one instance:
> (gdb) bt
> #0  0x000003ffa78be2c0 in raise () from /lib64/libc.so.6
> #1  0x000003ffa78bfc26 in abort () from /lib64/libc.so.6
> #2  0x0000000080427d80 in qemu_coroutine_enter (co=0xa2cf2b40, opaque=0x0)
> at /root/kvmdev/qemu/util/qemu-coroutine.c:112
> #3  0x000000008032246e in nbd_restart_write    (opaque=0xa2d0cd40) at
> /root/kvmdev/qemu/block/nbd-client.c:114
> #4  0x00000000802b3a1c in aio_dispatch (ctx=0xa2c907a0) at
> /root/kvmdev/qemu/aio-posix.c:341
> #5  0x00000000802b4332 in aio_poll (ctx=0xa2c907a0, blocking=true) at
> /root/kvmdev/qemu/aio-posix.c:479
> #6  0x0000000080155aba in iothread_run (opaque=0xa2c90260) at
> /root/kvmdev/qemu/iothread.c:46
> #7  0x000003ffa7a87c2c in start_thread () from /lib64/libpthread.so.0
> #8  0x000003ffa798ec9a in thread_start () from /lib64/libc.so.6

It may be worth looking at backtrace of all threads especially the monitor
thread (main thread).

> 
> I've also noticed that co->entry sometimes (maybe always?) points to
> mirror_run. Though, given that co->caller changes unexpectedly I don't know
> if we can trust co->entry.
> 
> I do not see the bug when I perform the same migration without migrating the
> disk.
> I also do not see the bug when I remove the iothread from the guest.
> 
> I tested this scenario as far back as tag v2.4.0 and hit the bug every time.
> I was unable to test v2.3.0 due to unresolved guest hangs. I did, however,
> manage to get as far as this commit:
> 
> commit ca96ac44dcd290566090b2435bc828fded356ad9
> Author: Stefan Hajnoczi <address@hidden>
> Date:   Tue Jul 28 18:34:09 2015 +0200
> AioContext: force event loop iteration using BH
> 
> This commit fixes a hang that my test scenario experiences. I was able to
> test even further back by cherry-picking ca96ac44 on top of the earlier
> commits but at this point I cannot be sure if the bug was introduced by
> ca96ac44 so I stopped.
> 
> I am willing to run tests or collect any info needed. I'll keep
> investigating but I won't turn down any help :).
> 
> Qemu command line as taken from Libvirt log:
> qemu-system-s390x
>     -name kvm1 -S -machine s390-ccw-virtio-2.6,accel=kvm,usb=off
>     -m 6144 -realtime mlock=off
>     -smp 1,sockets=1,cores=1,threads=1
>     -object iothread,id=iothread1
>     -uuid 3796d9f0-8555-4a1e-9d5c-fac56b8cbf56
>     -nographic -no-user-config -nodefaults
>     -chardev 
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-kvm1/monitor.sock,server,nowait
>     -mon chardev=charmonitor,id=monitor,mode=control
>     -rtc base=utc -no-shutdown
>     -boot strict=on -kernel /data/vms/kvm1/kvm1-image
>     -initrd /data/vms/kvm1/kvm1-initrd -append 'hvc_iucv=8 TERM=dumb'
>     -drive 
> file=/dev/disk/by-path/ccw-0.0.c22b,format=raw,if=none,id=drive-virtio-disk0,cache=none
>     -device 
> virtio-blk-ccw,scsi=off,devno=fe.0.0000,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
>     -drive 
> file=/data/vms/kvm1/kvm1.qcow,format=qcow2,if=none,id=drive-virtio-disk1,cache=none
>     -device 
> virtio-blk-ccw,iothread=iothread1,scsi=off,devno=fe.0.0008,drive=drive-virtio-disk1,id=virtio-disk1
>     -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=27
>     -device
> virtio-net-ccw,netdev=hostnet0,id=net0,mac=52:54:00:c9:86:2b,devno=fe.0.0001
>     -chardev pty,id=charconsole0 -device
> sclpconsole,chardev=charconsole0,id=console0
>     -device virtio-balloon-ccw,id=balloon0,devno=fe.0.0002 -msg timestamp=on
> 
> Libvirt migration command:
> virsh migrate --live --persistent --copy-storage-all --migrate-disks vdb
> kvm1 qemu+ssh://dev1/system
> 
> -- 
> -- Jason J. Herne (address@hidden)
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]