[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [Bug 1207686] [NEW] qemu-1.4.0 and onwards, linux kerne
From: |
Oliver Francke |
Subject: |
Re: [Qemu-devel] [Bug 1207686] [NEW] qemu-1.4.0 and onwards, linux kernel 3.2.x, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process |
Date: |
Fri, 02 Aug 2013 19:58:44 -0000 |
Hi Stefan,
Am 02.08.2013 um 17:24 schrieb Stefan Hajnoczi
<address@hidden>:
> On Fri, Aug 02, 2013 at 09:58:29AM -0000, Oliver Francke wrote:
>> after some testing I tried to narrow down a problem, which was initially
>> reported by some users.
>> Seen on different distros - debian 7.1, ubuntu 12.04 LTS, IPFire-2.3 as
>> reported by now.
>>
>> All using some flavour of linux-3.2.x kernel.
>>
>> Tried e.g. under Ubuntu an upgrade to "Linux 3.8.0-27-generic x86_64" which
>> solves the problem.
>
> Is that a guest kernel upgrade?
yeah, sorry if that was not clear enough.
>
>> Problem could be triggert with some workload ala:
>>
>> spew -v --raw -P -t -i 3 -b 4k -p random -B 4k 1G /tmp/doof.dat
>> and in parallel do some apt-get install/remove/whatever.
>>
>> That results in a somewhat stuck qemu-session with the bad
>> "kernel_hung_task..." messages.
>>
>> A typical command-line is as follows:
>>
>> /usr/local/qemu-1.6.0/bin/qemu-system-x86_64 -usbdevice tablet -enable-
>> kvm -daemonize -pidfile /var/run/qemu-server/760.pid -monitor
>> unix:/var/run/qemu-server/760.mon,server,nowait -vnc unix:/var/run/qemu-
>> server/760.vnc,password -qmp unix:/var/run/qemu-
>> server/760.qmp,server,nowait -nodefaults -serial none -parallel none
>> -device virtio-net-pci,mac=00:F1:70:00:2F:80,netdev=vlan0d0 -netdev
>> type=tap,id=vlan0d0,ifname=tap760i0d0,script=/etc/fcms/add_if.sh,downscript=/etc/fcms/downscript.sh
>> -name 1155823384-4 -m 512 -vga cirrus -k de -smp sockets=1,cores=1
>> -device virtio-blk-pci,drive=virtio0 -drive
>> format=raw,file=rbd:1155823384/vm-760-disk-1.rbd:rbd_cache=false,cache=writeback,if=none,id=virtio0,media=disk,index=0,aio=native
>> -drive
>> format=raw,file=rbd:1155823384/vm-760-swap-1.rbd:rbd_cache=false,cache=writeback,if=virtio,media=disk,index=1,aio=native
>> -drive if=ide,media=cdrom,id=ide1-cd0,readonly=on -drive
>> if=ide,media=cdrom,id=ide1-cd1,readonly=on -boot order=dc
>>
>> no "system_reset", "sendkey ctrl-alt-delete" or "q" in monitoring-
>> session is accepted, need to hard-kill the process.
>
> Yesterday I saw a possibly related report on IRC. It was a Windows
> guest running under OpenStack with images on Ceph.
>
> They reported that the QEMU process would lock up - ping would not work
> and their management tools showed 0 CPU activity for the guest.
> However, they were able to "kick" the guest by taking a VNC screenshot
> (I think). Then it would come back to life.
>
> If you have a Linux guest that is reporting kernel_hung_task, then it
> could be a similar scenario.
>
> Please confirm that the hung task message is from inside the guest.
>
confirmed.
> If you are able to reproduce this and have an alternative non-Ceph
> storage pool, please try that since Ceph is common to both these bug
> reports.
>
I can reproduce it with: kernel 3.2.something + qemu-1.[456] ( never spent much
time on 1.3) and high I/O.
I took this VM later this day and converted it to local-storage-qcow2, no prob
with any kernel. I already asked on ceph-users-list for assistance, especially
from Josh ( if he's not on summer holiday ;) )
What is strange, I have a session via VNC-console opened and have a loop
ala:
while true; do apt-get install -y ntp libopts25; apt-get remove -y
ntp-libopts25; done
and and parallel spew as described, the apt-"session" dies and one can see the
hung_task-thingy, but I still can restart the spew-test.
Just for completeness.
Thnx for you attention,
Oliver.
> Stefan
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1207686
>
> Title:
> qemu-1.4.0 and onwards, linux kernel 3.2.x, heavy I/O leads to
> kernel_hung_tasks_timout_secs message and unresponsive qemu-process
>
> Status in QEMU:
> New
>
> Bug description:
> Hi,
>
> after some testing I tried to narrow down a problem, which was initially
> reported by some users.
> Seen on different distros - debian 7.1, ubuntu 12.04 LTS, IPFire-2.3 as
> reported by now.
>
> All using some flavour of linux-3.2.x kernel.
>
> Tried e.g. under Ubuntu an upgrade to "Linux 3.8.0-27-generic x86_64" which
> solves the problem.
> Problem could be triggert with some workload ala:
>
> spew -v --raw -P -t -i 3 -b 4k -p random -B 4k 1G /tmp/doof.dat
> and in parallel do some apt-get install/remove/whatever.
>
> That results in a somewhat stuck qemu-session with the bad
> "kernel_hung_task..." messages.
>
> A typical command-line is as follows:
>
> /usr/local/qemu-1.6.0/bin/qemu-system-x86_64 -usbdevice tablet
> -enable-kvm -daemonize -pidfile /var/run/qemu-server/760.pid -monitor
> unix:/var/run/qemu-server/760.mon,server,nowait -vnc unix:/var/run
> /qemu-server/760.vnc,password -qmp unix:/var/run/qemu-
> server/760.qmp,server,nowait -nodefaults -serial none -parallel none
> -device virtio-net-pci,mac=00:F1:70:00:2F:80,netdev=vlan0d0 -netdev
>
> type=tap,id=vlan0d0,ifname=tap760i0d0,script=/etc/fcms/add_if.sh,downscript=/etc/fcms/downscript.sh
> -name 1155823384-4 -m 512 -vga cirrus -k de -smp sockets=1,cores=1
> -device virtio-blk-pci,drive=virtio0 -drive
>
> format=raw,file=rbd:1155823384/vm-760-disk-1.rbd:rbd_cache=false,cache=writeback,if=none,id=virtio0,media=disk,index=0,aio=native
> -drive
>
> format=raw,file=rbd:1155823384/vm-760-swap-1.rbd:rbd_cache=false,cache=writeback,if=virtio,media=disk,index=1,aio=native
> -drive if=ide,media=cdrom,id=ide1-cd0,readonly=on -drive
> if=ide,media=cdrom,id=ide1-cd1,readonly=on -boot order=dc
>
> no "system_reset", "sendkey ctrl-alt-delete" or "q" in monitoring-
> session is accepted, need to hard-kill the process.
>
> Please give any advice on what to do for tracing/debugging, because
> the number of tickets here are raising, and noone knows, what users
> are doing inside their VM.
>
> Kind regards,
>
> Oliver Francke.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/qemu/+bug/1207686/+subscriptions
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1207686
Title:
qemu-1.4.0 and onwards, linux kernel 3.2.x, heavy I/O leads to
kernel_hung_tasks_timout_secs message and unresponsive qemu-process
Status in QEMU:
New
Bug description:
Hi,
after some testing I tried to narrow down a problem, which was initially
reported by some users.
Seen on different distros - debian 7.1, ubuntu 12.04 LTS, IPFire-2.3 as
reported by now.
All using some flavour of linux-3.2.x kernel.
Tried e.g. under Ubuntu an upgrade to "Linux 3.8.0-27-generic x86_64" which
solves the problem.
Problem could be triggert with some workload ala:
spew -v --raw -P -t -i 3 -b 4k -p random -B 4k 1G /tmp/doof.dat
and in parallel do some apt-get install/remove/whatever.
That results in a somewhat stuck qemu-session with the bad
"kernel_hung_task..." messages.
A typical command-line is as follows:
/usr/local/qemu-1.6.0/bin/qemu-system-x86_64 -usbdevice tablet
-enable-kvm -daemonize -pidfile /var/run/qemu-server/760.pid -monitor
unix:/var/run/qemu-server/760.mon,server,nowait -vnc unix:/var/run
/qemu-server/760.vnc,password -qmp unix:/var/run/qemu-
server/760.qmp,server,nowait -nodefaults -serial none -parallel none
-device virtio-net-pci,mac=00:F1:70:00:2F:80,netdev=vlan0d0 -netdev
type=tap,id=vlan0d0,ifname=tap760i0d0,script=/etc/fcms/add_if.sh,downscript=/etc/fcms/downscript.sh
-name 1155823384-4 -m 512 -vga cirrus -k de -smp sockets=1,cores=1
-device virtio-blk-pci,drive=virtio0 -drive
format=raw,file=rbd:1155823384/vm-760-disk-1.rbd:rbd_cache=false,cache=writeback,if=none,id=virtio0,media=disk,index=0,aio=native
-drive
format=raw,file=rbd:1155823384/vm-760-swap-1.rbd:rbd_cache=false,cache=writeback,if=virtio,media=disk,index=1,aio=native
-drive if=ide,media=cdrom,id=ide1-cd0,readonly=on -drive
if=ide,media=cdrom,id=ide1-cd1,readonly=on -boot order=dc
no "system_reset", "sendkey ctrl-alt-delete" or "q" in monitoring-
session is accepted, need to hard-kill the process.
Please give any advice on what to do for tracing/debugging, because
the number of tickets here are raising, and noone knows, what users
are doing inside their VM.
Kind regards,
Oliver Francke.
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1207686/+subscriptions