[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] coroutines: block: Co-routine re-entered recursively wh
From: |
Fam Zheng |
Subject: |
Re: [Qemu-devel] coroutines: block: Co-routine re-entered recursively when migrating disk with iothreads |
Date: |
Tue, 24 May 2016 10:12:44 +0800 |
User-agent: |
Mutt/1.6.1 (2016-04-27) |
On Mon, 05/23 14:54, Jason J. Herne wrote:
> Using libvirt to migrate a guest and one guest disk that is using iothreads
> causes Qemu to crash with the message:
> Co-routine re-entered recursively
>
> I've looked into this one a bit but I have not seen anything that
> immediately stands out.
> Here is what I have found:
>
> In qemu_coroutine_enter:
> if (co->caller) {
> fprintf(stderr, "Co-routine re-entered recursively\n");
> abort();
> }
>
> The value of co->caller is actually changing between the time "if
> (co->caller)" is evaluated and the time I print some debug statements
> directly under the existing fprintf. I confirmed this by saving the value in
> a local variable and printing both the new local variable and co->caller
> immediately after the existing fprintf. This would certainly indicate some
> kind of concurrency issue. However, it does not necessarily point to the
> reason we ended up inside this if statement because co->caller was not NULL
> before it was trashed. Perhaps it was trashed more than once then? I figured
> maybe the problem was with coroutine pools so I disabled them
> (--disable-coroutine-pool) and still hit the bug.
Which coroutine backend are you using?
>
> The backtrace is not always identical. Here is one instance:
> (gdb) bt
> #0 0x000003ffa78be2c0 in raise () from /lib64/libc.so.6
> #1 0x000003ffa78bfc26 in abort () from /lib64/libc.so.6
> #2 0x0000000080427d80 in qemu_coroutine_enter (co=0xa2cf2b40, opaque=0x0)
> at /root/kvmdev/qemu/util/qemu-coroutine.c:112
> #3 0x000000008032246e in nbd_restart_write (opaque=0xa2d0cd40) at
> /root/kvmdev/qemu/block/nbd-client.c:114
> #4 0x00000000802b3a1c in aio_dispatch (ctx=0xa2c907a0) at
> /root/kvmdev/qemu/aio-posix.c:341
> #5 0x00000000802b4332 in aio_poll (ctx=0xa2c907a0, blocking=true) at
> /root/kvmdev/qemu/aio-posix.c:479
> #6 0x0000000080155aba in iothread_run (opaque=0xa2c90260) at
> /root/kvmdev/qemu/iothread.c:46
> #7 0x000003ffa7a87c2c in start_thread () from /lib64/libpthread.so.0
> #8 0x000003ffa798ec9a in thread_start () from /lib64/libc.so.6
It may be worth looking at backtrace of all threads especially the monitor
thread (main thread).
>
> I've also noticed that co->entry sometimes (maybe always?) points to
> mirror_run. Though, given that co->caller changes unexpectedly I don't know
> if we can trust co->entry.
>
> I do not see the bug when I perform the same migration without migrating the
> disk.
> I also do not see the bug when I remove the iothread from the guest.
>
> I tested this scenario as far back as tag v2.4.0 and hit the bug every time.
> I was unable to test v2.3.0 due to unresolved guest hangs. I did, however,
> manage to get as far as this commit:
>
> commit ca96ac44dcd290566090b2435bc828fded356ad9
> Author: Stefan Hajnoczi <address@hidden>
> Date: Tue Jul 28 18:34:09 2015 +0200
> AioContext: force event loop iteration using BH
>
> This commit fixes a hang that my test scenario experiences. I was able to
> test even further back by cherry-picking ca96ac44 on top of the earlier
> commits but at this point I cannot be sure if the bug was introduced by
> ca96ac44 so I stopped.
>
> I am willing to run tests or collect any info needed. I'll keep
> investigating but I won't turn down any help :).
>
> Qemu command line as taken from Libvirt log:
> qemu-system-s390x
> -name kvm1 -S -machine s390-ccw-virtio-2.6,accel=kvm,usb=off
> -m 6144 -realtime mlock=off
> -smp 1,sockets=1,cores=1,threads=1
> -object iothread,id=iothread1
> -uuid 3796d9f0-8555-4a1e-9d5c-fac56b8cbf56
> -nographic -no-user-config -nodefaults
> -chardev
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-kvm1/monitor.sock,server,nowait
> -mon chardev=charmonitor,id=monitor,mode=control
> -rtc base=utc -no-shutdown
> -boot strict=on -kernel /data/vms/kvm1/kvm1-image
> -initrd /data/vms/kvm1/kvm1-initrd -append 'hvc_iucv=8 TERM=dumb'
> -drive
> file=/dev/disk/by-path/ccw-0.0.c22b,format=raw,if=none,id=drive-virtio-disk0,cache=none
> -device
> virtio-blk-ccw,scsi=off,devno=fe.0.0000,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
> -drive
> file=/data/vms/kvm1/kvm1.qcow,format=qcow2,if=none,id=drive-virtio-disk1,cache=none
> -device
> virtio-blk-ccw,iothread=iothread1,scsi=off,devno=fe.0.0008,drive=drive-virtio-disk1,id=virtio-disk1
> -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=27
> -device
> virtio-net-ccw,netdev=hostnet0,id=net0,mac=52:54:00:c9:86:2b,devno=fe.0.0001
> -chardev pty,id=charconsole0 -device
> sclpconsole,chardev=charconsole0,id=console0
> -device virtio-balloon-ccw,id=balloon0,devno=fe.0.0002 -msg timestamp=on
>
> Libvirt migration command:
> virsh migrate --live --persistent --copy-storage-all --migrate-disks vdb
> kvm1 qemu+ssh://dev1/system
>
> --
> -- Jason J. Herne (address@hidden)
>