Re: [Qemu-devel] [Bug 1207686] [NEW] qemu-1.4.0 and onwards, linux kerne

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Bug 1207686] [NEW] qemu-1.4.0 and onwards, linux kerne

From:	Oliver Francke
Subject:	Re: [Qemu-devel] [Bug 1207686] [NEW] qemu-1.4.0 and onwards, linux kernel 3.2.x, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process
Date:	Fri, 02 Aug 2013 19:58:44 -0000

Hi Stefan,

Am 02.08.2013 um 17:24 schrieb Stefan Hajnoczi
<address@hidden>:

> On Fri, Aug 02, 2013 at 09:58:29AM -0000, Oliver Francke wrote:
>> after some testing I tried to narrow down a problem, which was initially 
>> reported by some users.
>> Seen on different distros - debian 7.1, ubuntu 12.04 LTS, IPFire-2.3 as 
>> reported by now.
>> 
>> All using some flavour of linux-3.2.x kernel.
>> 
>> Tried e.g. under Ubuntu an upgrade to "Linux 3.8.0-27-generic x86_64" which 
>> solves the problem.
> 
> Is that a guest kernel upgrade?

yeah, sorry if that was not clear enough.

> 
>> Problem could be triggert with some workload ala:
>> 
>> spew -v --raw -P -t -i 3 -b 4k -p random -B 4k 1G /tmp/doof.dat
>> and in parallel do some apt-get install/remove/whatever.
>> 
>> That results in a somewhat stuck qemu-session with the bad
>> "kernel_hung_task..." messages.
>> 
>> A typical command-line is as follows:
>> 
>> /usr/local/qemu-1.6.0/bin/qemu-system-x86_64 -usbdevice tablet -enable-
>> kvm -daemonize -pidfile /var/run/qemu-server/760.pid -monitor
>> unix:/var/run/qemu-server/760.mon,server,nowait -vnc unix:/var/run/qemu-
>> server/760.vnc,password -qmp unix:/var/run/qemu-
>> server/760.qmp,server,nowait -nodefaults -serial none -parallel none
>> -device virtio-net-pci,mac=00:F1:70:00:2F:80,netdev=vlan0d0 -netdev
>> type=tap,id=vlan0d0,ifname=tap760i0d0,script=/etc/fcms/add_if.sh,downscript=/etc/fcms/downscript.sh
>> -name 1155823384-4 -m 512 -vga cirrus -k de -smp sockets=1,cores=1
>> -device virtio-blk-pci,drive=virtio0 -drive
>> format=raw,file=rbd:1155823384/vm-760-disk-1.rbd:rbd_cache=false,cache=writeback,if=none,id=virtio0,media=disk,index=0,aio=native
>> -drive
>> format=raw,file=rbd:1155823384/vm-760-swap-1.rbd:rbd_cache=false,cache=writeback,if=virtio,media=disk,index=1,aio=native
>> -drive if=ide,media=cdrom,id=ide1-cd0,readonly=on -drive
>> if=ide,media=cdrom,id=ide1-cd1,readonly=on -boot order=dc
>> 
>> no "system_reset", "sendkey ctrl-alt-delete" or "q" in monitoring-
>> session is accepted, need to hard-kill the process.
> 
> Yesterday I saw a possibly related report on IRC.  It was a Windows
> guest running under OpenStack with images on Ceph.
> 
> They reported that the QEMU process would lock up - ping would not work
> and their management tools showed 0 CPU activity for the guest.
> However, they were able to "kick" the guest by taking a VNC screenshot
> (I think).  Then it would come back to life.
> 
> If you have a Linux guest that is reporting kernel_hung_task, then it
> could be a similar scenario.
> 
> Please confirm that the hung task message is from inside the guest.
> 

confirmed.

> If you are able to reproduce this and have an alternative non-Ceph
> storage pool, please try that since Ceph is common to both these bug
> reports.
> 

I can reproduce it with: kernel 3.2.something + qemu-1.[456] ( never spent much 
time on 1.3) and high I/O.
I took this VM later this day and converted it to local-storage-qcow2, no prob 
with any kernel. I already asked on ceph-users-list for assistance, especially 
from Josh ( if he's not on summer holiday ;) )

What is strange, I have a session via VNC-console opened and have a loop
ala:

while true; do apt-get install -y ntp libopts25; apt-get remove -y 
ntp-libopts25; done
and and parallel spew as described, the apt-"session" dies and one can see the 
hung_task-thingy, but I still can restart the spew-test.
Just for completeness.

Thnx for you attention,

Oliver.

> Stefan
> 
> -- 
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1207686
> 
> Title:
>  qemu-1.4.0 and onwards, linux kernel 3.2.x, heavy I/O leads to
>  kernel_hung_tasks_timout_secs message and unresponsive qemu-process
> 
> Status in QEMU:
>  New
> 
> Bug description:
>  Hi,
> 
>  after some testing I tried to narrow down a problem, which was initially 
> reported by some users.
>  Seen on different distros - debian 7.1, ubuntu 12.04 LTS, IPFire-2.3 as 
> reported by now.
> 
>  All using some flavour of linux-3.2.x kernel.
> 
>  Tried e.g. under Ubuntu an upgrade to "Linux 3.8.0-27-generic x86_64" which 
> solves the problem.
>  Problem could be triggert with some workload ala:
> 
>  spew -v --raw -P -t -i 3 -b 4k -p random -B 4k 1G /tmp/doof.dat
>  and in parallel do some apt-get install/remove/whatever.
> 
>  That results in a somewhat stuck qemu-session with the bad
>  "kernel_hung_task..." messages.
> 
>  A typical command-line is as follows:
> 
>  /usr/local/qemu-1.6.0/bin/qemu-system-x86_64 -usbdevice tablet
>  -enable-kvm -daemonize -pidfile /var/run/qemu-server/760.pid -monitor
>  unix:/var/run/qemu-server/760.mon,server,nowait -vnc unix:/var/run
>  /qemu-server/760.vnc,password -qmp unix:/var/run/qemu-
>  server/760.qmp,server,nowait -nodefaults -serial none -parallel none
>  -device virtio-net-pci,mac=00:F1:70:00:2F:80,netdev=vlan0d0 -netdev
>  
> type=tap,id=vlan0d0,ifname=tap760i0d0,script=/etc/fcms/add_if.sh,downscript=/etc/fcms/downscript.sh
>  -name 1155823384-4 -m 512 -vga cirrus -k de -smp sockets=1,cores=1
>  -device virtio-blk-pci,drive=virtio0 -drive
>  
> format=raw,file=rbd:1155823384/vm-760-disk-1.rbd:rbd_cache=false,cache=writeback,if=none,id=virtio0,media=disk,index=0,aio=native
>  -drive
>  
> format=raw,file=rbd:1155823384/vm-760-swap-1.rbd:rbd_cache=false,cache=writeback,if=virtio,media=disk,index=1,aio=native
>  -drive if=ide,media=cdrom,id=ide1-cd0,readonly=on -drive
>  if=ide,media=cdrom,id=ide1-cd1,readonly=on -boot order=dc
> 
>  no "system_reset", "sendkey ctrl-alt-delete" or "q" in monitoring-
>  session is accepted, need to hard-kill the process.
> 
>  Please give any advice on what to do for tracing/debugging, because
>  the number of tickets here are raising, and noone knows, what users
>  are doing inside their VM.
> 
>  Kind regards,
> 
>  Oliver Francke.
> 
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/qemu/+bug/1207686/+subscriptions

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1207686

Title:
  qemu-1.4.0 and onwards, linux kernel 3.2.x, heavy I/O leads to
  kernel_hung_tasks_timout_secs message and unresponsive qemu-process

Status in QEMU:
  New

Bug description:
  Hi,

  after some testing I tried to narrow down a problem, which was initially 
reported by some users.
  Seen on different distros - debian 7.1, ubuntu 12.04 LTS, IPFire-2.3 as 
reported by now.

  All using some flavour of linux-3.2.x kernel.

  Tried e.g. under Ubuntu an upgrade to "Linux 3.8.0-27-generic x86_64" which 
solves the problem.
  Problem could be triggert with some workload ala:

  spew -v --raw -P -t -i 3 -b 4k -p random -B 4k 1G /tmp/doof.dat
  and in parallel do some apt-get install/remove/whatever.

  That results in a somewhat stuck qemu-session with the bad
  "kernel_hung_task..." messages.

  A typical command-line is as follows:

  /usr/local/qemu-1.6.0/bin/qemu-system-x86_64 -usbdevice tablet
  -enable-kvm -daemonize -pidfile /var/run/qemu-server/760.pid -monitor
  unix:/var/run/qemu-server/760.mon,server,nowait -vnc unix:/var/run
  /qemu-server/760.vnc,password -qmp unix:/var/run/qemu-
  server/760.qmp,server,nowait -nodefaults -serial none -parallel none
  -device virtio-net-pci,mac=00:F1:70:00:2F:80,netdev=vlan0d0 -netdev
  
type=tap,id=vlan0d0,ifname=tap760i0d0,script=/etc/fcms/add_if.sh,downscript=/etc/fcms/downscript.sh
  -name 1155823384-4 -m 512 -vga cirrus -k de -smp sockets=1,cores=1
  -device virtio-blk-pci,drive=virtio0 -drive
  
format=raw,file=rbd:1155823384/vm-760-disk-1.rbd:rbd_cache=false,cache=writeback,if=none,id=virtio0,media=disk,index=0,aio=native
  -drive
  
format=raw,file=rbd:1155823384/vm-760-swap-1.rbd:rbd_cache=false,cache=writeback,if=virtio,media=disk,index=1,aio=native
  -drive if=ide,media=cdrom,id=ide1-cd0,readonly=on -drive
  if=ide,media=cdrom,id=ide1-cd1,readonly=on -boot order=dc

  no "system_reset", "sendkey ctrl-alt-delete" or "q" in monitoring-
  session is accepted, need to hard-kill the process.

  Please give any advice on what to do for tracing/debugging, because
  the number of tickets here are raising, and noone knows, what users
  are doing inside their VM.

  Kind regards,

  Oliver Francke.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1207686/+subscriptions

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PATCH 0/2] tests: Fixes for in-tree build, armbru, 2013/08/20
- [Qemu-devel] [PATCH 1/2] tests: Fix schema parser test for in-tree build, armbru, 2013/08/20
- [Qemu-devel] [PATCH 2/2] tests: Update .gitignore for test-int128 and test-bitops, armbru, 2013/08/20

Prev by Date: [Qemu-devel] [PATCH EMBARGOED 6/7] vmdk: check l2 table size when opening
Next by Date: Re: [Qemu-devel] [PATCH for-1.6] linux-user: Return success from m68k set_thread_area syscall
Previous by thread: [Qemu-devel] [PATCH v4 00/13] VHDX log replay and write support, .bdrv_create()
Next by thread: [Qemu-devel] [PATCH 1/2] tests: Fix schema parser test for in-tree build
Index(es):
- Date
- Thread