qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [Bug 587993] Re: qemu-kvm 0.12.4+dfsg-1 from debian squeeze


From: Maciek
Subject: [Qemu-devel] [Bug 587993] Re: qemu-kvm 0.12.4+dfsg-1 from debian squeeze crashes "BUG: unable to handle kernel NULL pointer" (sym53c8xx)
Date: Fri, 04 Jun 2010 12:59:05 -0000

** Description changed:

  I use eucalyptus software (1.6.2) on debian squeeze with kvm
- 0.12.4+dfsg-1. Kernel 2.6.32-3-amd64. After a few days machines crash.
- There are no logs in host system. Guest is the same kernel and OS as
- host. The kvm process use 100% of cpu time. I can not even ping the
- guest. Here is the log from virtual machine:
+ 0.12.4+dfsg-1 (the same happend with 0.11.1+dfsg-1 ). Kernel
+ 2.6.32-3-amd64. After a few days machines crash. There are no logs in
+ host system. Guest is the same kernel and OS as host. The kvm process
+ use 100% of cpu time. I can not even ping the guest. Here is the log
+ from virtual machine:
  
  [ 3577.816666] sd 0:0:0:0: [sda] ABORT operation started
  [ 3582.816047] sd 0:0:0:0: ABORT operation timed-out.
  [ 3582.816781] sd 0:0:0:0: [sda] ABORT operation started
  [ 3587.816649] sd 0:0:0:0: ABORT operation timed-out.
  [ 3587.817379] sd 0:0:0:0: [sda] DEVICE RESET operation started
  [ 3592.816062] sd 0:0:0:0: DEVICE RESET operation timed-out.
  [ 3592.816882] sd 0:0:0:0: [sda] BUS RESET operation started
  [ 3592.820056] sym0: SCSI BUS reset detected.
  [ 3592.831538] sym0: SCSI BUS has been reset.
  [ 3592.831968] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000358
  [ 3592.832003] IP: [<ffffffffa01147c4>] sym_int_sir+0x62f/0x14e0 [sym53c8xx]
- [ 3592.832003] PGD 5f73e067 PUD 5fa53067 PMD 0 
- [ 3592.832003] Oops: 0000 [#1] SMP 
+ [ 3592.832003] PGD 5f73e067 PUD 5fa53067 PMD 0
+ [ 3592.832003] Oops: 0000 [#1] SMP
  [ 3592.832003] last sysfs file: 
/sys/devices/pci0000:00/0000:00:05.0/host0/target0:0:0/0:0:0:0/vendor
- [ 3592.832003] CPU 0 
+ [ 3592.832003] CPU 0
  [ 3592.832003] Modules linked in: dm_mod openafs(P) ext2 snd_pcsp snd_pcm 
snd_timer serio_raw i2c_piix4 snd virtio_balloon evdev i2c_core soundcore 
psmouse button processor snd_page_alloc ext3 jbd mbcache sd_mod crc_t10dif 
ata_generic libata ide_pci_generic sym53c8xx scsi_transport_spi thermal piix 
uhci_hcd ehci_hcd floppy thermal_sys scsi_mod virtio_pci virtio_ring virtio 
e1000 ide_core usbcore nls_base [last unloaded: scsi_wait_scan]
  [ 3592.832003] Pid: 193, comm: scsi_eh_0 Tainted: P           2.6.32-3-amd64 
#1 Bochs
  [ 3592.832003] RIP: 0010:[<ffffffffa01147c4>]  [<ffffffffa01147c4>] 
sym_int_sir+0x62f/0x14e0 [sym53c8xx]
  [ 3592.832003] RSP: 0018:ffff880001803cb0  EFLAGS: 00010287
  [ 3592.832003] RAX: 000000000000000a RBX: 000000000000000b RCX: 
000000005f410090
  [ 3592.832003] RDX: 0000000000000000 RSI: ffff88005c450800 RDI: 
ffffc90000a5e006
  [ 3592.832003] RBP: ffff88005f410000 R08: 0000000000000000 R09: 
0000000000000000
  [ 3592.832003] R10: 000000000000003a R11: ffffffff813b871e R12: 
ffff88005f410090
  [ 3592.832003] R13: 0000000000000084 R14: 0000000000000000 R15: 
0000000000000001
  [ 3592.832003] FS:  0000000000000000(0000) GS:ffff880001800000(0000) 
knlGS:0000000000000000
  [ 3592.832003] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
  [ 3592.832003] CR2: 0000000000000358 CR3: 000000005e269000 CR4: 
00000000000006f0
  [ 3592.832003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
  [ 3592.832003] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
0000000000000400
  [ 3592.832003] Process scsi_eh_0 (pid: 193, threadinfo ffff88005f6fa000, task 
ffff88005f697880)
  [ 3592.832003] Stack:
  [ 3592.832003]  ffff88005f3fd000 0000000000000000 0000000000000130 
0000000000000000
  [ 3592.832003] <0> ffff88005f407710 ffffc90000a64710 ffffffffffffff10 
ffffffff81195301
  [ 3592.832003] <0> 0000000000000010 0000000000010212 ffff880001803d18 
0000000000000018
  [ 3592.832003] Call Trace:
- [ 3592.832003]  <IRQ> 
+ [ 3592.832003]  <IRQ>
  [ 3592.832003]  [<ffffffff81195301>] ? __memcpy_toio+0x9/0x19
  [ 3592.832003]  [<ffffffffa01164ed>] ? sym_interrupt+0x46c/0x6a3 [sym53c8xx]
  [ 3592.832003]  [<ffffffff8103fea0>] ? update_curr+0xa6/0x147
  [ 3592.832003]  [<ffffffffa010fbde>] ? sym53c8xx_intr+0x43/0x6a [sym53c8xx]
  [ 3592.832003]  [<ffffffff81093bfc>] ? handle_IRQ_event+0x58/0x126
  [ 3592.832003]  [<ffffffff810954e2>] ? handle_fasteoi_irq+0x7d/0xb5
  [ 3592.832003]  [<ffffffff81013957>] ? handle_irq+0x17/0x1d
  [ 3592.832003]  [<ffffffff81012fb1>] ? do_IRQ+0x57/0xb6
  [ 3592.832003]  [<ffffffff810114d3>] ? ret_from_intr+0x0/0x11
  [ 3592.832003]  [<ffffffff81053903>] ? __do_softirq+0x6e/0x19f
  [ 3592.832003]  [<ffffffff8106fa87>] ? tick_dev_program_event+0x2d/0x95
  [ 3592.832003]  [<ffffffff81011cac>] ? call_softirq+0x1c/0x30
  [ 3592.832003]  [<ffffffff81013903>] ? do_softirq+0x3f/0x7c
  [ 3592.832003]  [<ffffffff810537e1>] ? irq_exit+0x36/0x76
  [ 3592.832003]  [<ffffffff81025837>] ? smp_apic_timer_interrupt+0x87/0x95
  [ 3592.832003]  [<ffffffff81011673>] ? apic_timer_interrupt+0x13/0x20
- [ 3592.832003]  <EOI> 
+ [ 3592.832003]  <EOI>
  [ 3592.832003]  [<ffffffff8118e009>] ? delay_tsc+0x0/0x73
  [ 3592.832003]  [<ffffffffa010f900>] ? sym_eh_handler+0x22e/0x2e2 [sym53c8xx]
  [ 3592.832003]  [<ffffffffa008e5de>] ? scsi_try_bus_reset+0x50/0xd9 [scsi_mod]
  [ 3592.832003]  [<ffffffffa008f565>] ? scsi_eh_ready_devs+0x50c/0x781 
[scsi_mod]
  [ 3592.832003]  [<ffffffffa008fd6b>] ? scsi_error_handler+0x3c1/0x5b5 
[scsi_mod]
  [ 3592.832003]  [<ffffffffa008f9aa>] ? scsi_error_handler+0x0/0x5b5 [scsi_mod]
  [ 3592.832003]  [<ffffffff81064789>] ? kthread+0x79/0x81
  [ 3592.832003]  [<ffffffff81011baa>] ? child_rip+0xa/0x20
  [ 3592.832003]  [<ffffffff81064710>] ? kthread+0x0/0x81
  [ 3592.832003]  [<ffffffff81011ba0>] ? child_rip+0x0/0x20
- [ 3592.832003] Code: 48 c7 c7 82 92 11 a0 eb 63 48 8b 98 38 01 00 00 48 8d b8 
28 01 00 00 e8 df d5 0f e1 48 89 da 48 89 c6 48 c7 c7 bc 92 11 a0 eb 6d <49> 8b 
96 58 03 00 00 48 8b 82 80 00 00 00 48 8b a8 b0 00 00 00 
+ [ 3592.832003] Code: 48 c7 c7 82 92 11 a0 eb 63 48 8b 98 38 01 00 00 48 8d b8 
28 01 00 00 e8 df d5 0f e1 48 89 da 48 89 c6 48 c7 c7 bc 92 11 a0 eb 6d <49> 8b 
96 58 03 00 00 48 8b 82 80 00 00 00 48 8b a8 b0 00 00 00
  [ 3592.832003] RIP  [<ffffffffa01147c4>] sym_int_sir+0x62f/0x14e0 [sym53c8xx]
  [ 3592.832003]  RSP <ffff880001803cb0>
  [ 3592.832003] CR2: 0000000000000358
  [ 3592.867935] ---[ end trace 06f90ebbdbd172ee ]---
  [ 3592.868360] Kernel panic - not syncing: Fatal exception in interrupt
  [ 3592.868906] Pid: 193, comm: scsi_eh_0 Tainted: P      D    2.6.32-3-amd64 
#1
  [ 3592.869511] Call Trace:
  [ 3592.869727]  <IRQ>  [<ffffffff812ed349>] ? panic+0x86/0x141
  [ 3592.870225]  [<ffffffff81011673>] ? apic_timer_interrupt+0x13/0x20
  [ 3592.870778]  [<ffffffff811afbdc>] ? dummycon_dummy+0x0/0x3
  [ 3592.871250]  [<ffffffff81014a37>] ? oops_end+0x64/0xb4
  [ 3592.871694]  [<ffffffff81014a7a>] ? oops_end+0xa7/0xb4
  [ 3592.872150]  [<ffffffff810322b8>] ? no_context+0x1e9/0x1f8
  [ 3592.872626]  [<ffffffff8103246d>] ? __bad_area_nosemaphore+0x1a6/0x1ca
  [ 3592.873185]  [<ffffffff8106807c>] ? up+0xe/0x36
  [ 3592.873576]  [<ffffffff8104e219>] ? release_console_sem+0x17e/0x1af
  [ 3592.874125]  [<ffffffff81024d72>] ? lapic_next_event+0x18/0x1d
  [ 3592.874642]  [<ffffffff812ef595>] ? page_fault+0x25/0x30
  [ 3592.875103]  [<ffffffffa01147c4>] ? sym_int_sir+0x62f/0x14e0 [sym53c8xx]
  [ 3592.875678]  [<ffffffff81195301>] ? __memcpy_toio+0x9/0x19
  [ 3592.876162]  [<ffffffffa01164ed>] ? sym_interrupt+0x46c/0x6a3 [sym53c8xx]
  [ 3592.876748]  [<ffffffff8103fea0>] ? update_curr+0xa6/0x147
  [ 3592.877224]  [<ffffffffa010fbde>] ? sym53c8xx_intr+0x43/0x6a [sym53c8xx]
  [ 3592.877800]  [<ffffffff81093bfc>] ? handle_IRQ_event+0x58/0x126
  [ 3592.878319]  [<ffffffff810954e2>] ? handle_fasteoi_irq+0x7d/0xb5
  [ 3592.878848]  [<ffffffff81013957>] ? handle_irq+0x17/0x1d
  [ 3592.879305]  [<ffffffff81012fb1>] ? do_IRQ+0x57/0xb6
  [ 3592.879744]  [<ffffffff810114d3>] ? ret_from_intr+0x0/0x11
  [ 3592.880237]  [<ffffffff81053903>] ? __do_softirq+0x6e/0x19f
  [ 3592.880723]  [<ffffffff8106fa87>] ? tick_dev_program_event+0x2d/0x95
  [ 3592.881284]  [<ffffffff81011cac>] ? call_softirq+0x1c/0x30
  [ 3592.881762]  [<ffffffff81013903>] ? do_softirq+0x3f/0x7c
  [ 3592.882230]  [<ffffffff810537e1>] ? irq_exit+0x36/0x76
  [ 3592.882691]  [<ffffffff81025837>] ? smp_apic_timer_interrupt+0x87/0x95
  [ 3592.883258]  [<ffffffff81011673>] ? apic_timer_interrupt+0x13/0x20
  [ 3592.883795]  <EOI>  [<ffffffff8118e009>] ? delay_tsc+0x0/0x73
  [ 3592.884319]  [<ffffffffa010f900>] ? sym_eh_handler+0x22e/0x2e2 [sym53c8xx]
  [ 3592.884917]  [<ffffffffa008e5de>] ? scsi_try_bus_reset+0x50/0xd9 [scsi_mod]
  [ 3592.885522]  [<ffffffffa008f565>] ? scsi_eh_ready_devs+0x50c/0x781 
[scsi_mod]
  [ 3592.886152]  [<ffffffffa008fd6b>] ? scsi_error_handler+0x3c1/0x5b5 
[scsi_mod]
  [ 3592.886789]  [<ffffffffa008f9aa>] ? scsi_error_handler+0x0/0x5b5 [scsi_mod]
  [ 3592.887398]  [<ffffffff81064789>] ? kthread+0x79/0x81
  [ 3592.887836]  [<ffffffff81011baa>] ? child_rip+0xa/0x20
  [ 3592.888290]  [<ffffffff81064710>] ? kthread+0x0/0x81
  [ 3592.888721]  [<ffffffff81011ba0>] ? child_rip+0x0/0x20
  
  Unfortunatelly I have no idea how to reproduce the problem.
+ 
+ Log from /var/log/libvirt/qemu/
+ lsi_scsi: error: Unimplemented message 0x0c
+ 
+ What is more I had 7 vm running. Today four of them crashed at the same
+ time. The rest survived with something like this in syslog:
+ 
+ [651330.816043] sd 0:0:0:0: [sda] ABORT operation started
+ [651335.860027] sd 0:0:0:0: ABORT operation timed-out.
+ [651335.860600] sd 0:0:0:0: [sda] ABORT operation started
+ [651337.019355] sd 0:0:0:0: ABORT operation complete.
+ [651337.038506] sd 0:0:0:0: [sda] ABORT operation started
+ [651337.039100] sd 0:0:0:0: ABORT operation failed.
+ [651337.039624] sd 0:0:0:0: [sda] ABORT operation started
+ [651337.040303] sd 0:0:0:0: ABORT operation failed.
+ [651337.040834] sd 0:0:0:0: [sda] ABORT operation started
+ [651337.041417] sd 0:0:0:0: ABORT operation failed.
+ [651337.041949] sd 0:0:0:0: [sda] ABORT operation started
+ [651337.042534] sd 0:0:0:0: ABORT operation failed.
+ [651337.043072] sd 0:0:0:0: [sda] DEVICE RESET operation started
+ [651337.043834] scsi target0:0:0: control msgout: c.
+ [651337.520075] scsi target0:0:0: has been reset
+ [651337.521726] sd 0:0:0:0: DEVICE RESET operation complete.
+ [651337.522495] sd 0:0:0:0: M_REJECT received (0:0).
+ 
+ It looks like the problem is in host system and has influence on all
+ machines at the same time. I have found the same pattern in syslog on
+ machines which crashed. It was 3 days before crash. There is no
+ information in host log files at all. Is this possible that eucalyptus
+ (1.6.2) caused this? With 1.6.1 I didin't have these problems.
+ Eucalyptus runs kvm (0.12 and 0.11) with commands:
+ 
+ /usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin HOME=/root
+ USER=root LOGNAME=root /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 512
+ -smp 1,sockets=1,cores=1,threads=1 -name i-35B80630 -uuid
+ 7e9b2fc1-9a9d-7114-3cb4-f4fdb3d51a3a -nographic -nodefaults -chardev
+ socket,id=monitor,path=/var/lib/libvirt/qemu/i-35B80630.monitor,server,nowait
+ -mon chardev=monitor,mode=readline -rtc base=utc -boot c -kernel
+ /var/lib/eucalyptus/instances/winnie/i-35B80630/kernel -initrd
+ /var/lib/eucalyptus/instances/winnie/i-35B80630/ramdisk -append
+ root=/dev/sda1 console=ttyS0 -device lsi,id=scsi0,bus=pci.0,addr=0x5
+ -drive
+ file=/var/lib/eucalyptus/instances/winnie/i-35B80630/disk,if=none,id
+ =drive-scsi0-0-0,boot=on -device scsi-disk,bus=scsi0.0,scsi-id=0,drive
+ =drive-scsi0-0-0,id=scsi0-0-0 -device
+ e1000,vlan=0,id=net0,mac=d0:0d:35:b8:06:30,bus=pci.0,addr=0x4 -net
+ tap,fd=43,vlan=0,name=hostnet0 -chardev
+ 
file,id=serial0,path=/var/lib/eucalyptus/instances/winnie/i-35B80630/console.log
+ -device isa-serial,chardev=serial0 -usb -device virtio-balloon-
+ pci,id=balloon0,bus=pci.0,addr=0x3
+ 
+ /usr/bin/kvm -S -M pc-0.11 -enable-kvm -m 512 -smp 1 -name i-492407F3
+ -uuid b2dc266e-a62a-4e13-3847-f9104eba4135 -nographic -monitor
+ unix:/var/lib/libvirt/qemu/i-492407F3.monitor,server,nowait -boot c
+ -kernel /var/lib/eucalyptus/instances/admin/i-492407F3/kernel -initrd
+ /var/lib/eucalyptus/instances/admin/i-492407F3/ramdisk -append
+ root=/dev/sda1 console=ttyS0 -drive
+ 
file=/var/lib/eucalyptus/instances/admin/i-492407F3/disk,if=scsi,bus=0,unit=0,boot=on
+ -net nic,macaddr=d0:0d:49:24:07:f3,vlan=0,model=e1000,name=net0 -net
+ tap,fd=118,vlan=0,name=hostnet0 -serial
+ file:/var/lib/eucalyptus/instances/admin/i-492407F3/console.log
+ -parallel none -usb -vga none -balloon virtio
+ 
+ I can give the access to vm.

-- 
qemu-kvm 0.12.4+dfsg-1 from debian squeeze crashes "BUG: unable to handle 
kernel NULL pointer" (sym53c8xx)
https://bugs.launchpad.net/bugs/587993
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: Incomplete

Bug description:
I use eucalyptus software (1.6.2) on debian squeeze with kvm 0.12.4+dfsg-1 (the 
same happend with 0.11.1+dfsg-1 ). Kernel 2.6.32-3-amd64. After a few days 
machines crash. There are no logs in host system. Guest is the same kernel and 
OS as host. The kvm process use 100% of cpu time. I can not even ping the 
guest. Here is the log from virtual machine:

[ 3577.816666] sd 0:0:0:0: [sda] ABORT operation started
[ 3582.816047] sd 0:0:0:0: ABORT operation timed-out.
[ 3582.816781] sd 0:0:0:0: [sda] ABORT operation started
[ 3587.816649] sd 0:0:0:0: ABORT operation timed-out.
[ 3587.817379] sd 0:0:0:0: [sda] DEVICE RESET operation started
[ 3592.816062] sd 0:0:0:0: DEVICE RESET operation timed-out.
[ 3592.816882] sd 0:0:0:0: [sda] BUS RESET operation started
[ 3592.820056] sym0: SCSI BUS reset detected.
[ 3592.831538] sym0: SCSI BUS has been reset.
[ 3592.831968] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000358
[ 3592.832003] IP: [<ffffffffa01147c4>] sym_int_sir+0x62f/0x14e0 [sym53c8xx]
[ 3592.832003] PGD 5f73e067 PUD 5fa53067 PMD 0
[ 3592.832003] Oops: 0000 [#1] SMP
[ 3592.832003] last sysfs file: 
/sys/devices/pci0000:00/0000:00:05.0/host0/target0:0:0/0:0:0:0/vendor
[ 3592.832003] CPU 0
[ 3592.832003] Modules linked in: dm_mod openafs(P) ext2 snd_pcsp snd_pcm 
snd_timer serio_raw i2c_piix4 snd virtio_balloon evdev i2c_core soundcore 
psmouse button processor snd_page_alloc ext3 jbd mbcache sd_mod crc_t10dif 
ata_generic libata ide_pci_generic sym53c8xx scsi_transport_spi thermal piix 
uhci_hcd ehci_hcd floppy thermal_sys scsi_mod virtio_pci virtio_ring virtio 
e1000 ide_core usbcore nls_base [last unloaded: scsi_wait_scan]
[ 3592.832003] Pid: 193, comm: scsi_eh_0 Tainted: P           2.6.32-3-amd64 #1 
Bochs
[ 3592.832003] RIP: 0010:[<ffffffffa01147c4>]  [<ffffffffa01147c4>] 
sym_int_sir+0x62f/0x14e0 [sym53c8xx]
[ 3592.832003] RSP: 0018:ffff880001803cb0  EFLAGS: 00010287
[ 3592.832003] RAX: 000000000000000a RBX: 000000000000000b RCX: 000000005f410090
[ 3592.832003] RDX: 0000000000000000 RSI: ffff88005c450800 RDI: ffffc90000a5e006
[ 3592.832003] RBP: ffff88005f410000 R08: 0000000000000000 R09: 0000000000000000
[ 3592.832003] R10: 000000000000003a R11: ffffffff813b871e R12: ffff88005f410090
[ 3592.832003] R13: 0000000000000084 R14: 0000000000000000 R15: 0000000000000001
[ 3592.832003] FS:  0000000000000000(0000) GS:ffff880001800000(0000) 
knlGS:0000000000000000
[ 3592.832003] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 3592.832003] CR2: 0000000000000358 CR3: 000000005e269000 CR4: 00000000000006f0
[ 3592.832003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3592.832003] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 3592.832003] Process scsi_eh_0 (pid: 193, threadinfo ffff88005f6fa000, task 
ffff88005f697880)
[ 3592.832003] Stack:
[ 3592.832003]  ffff88005f3fd000 0000000000000000 0000000000000130 
0000000000000000
[ 3592.832003] <0> ffff88005f407710 ffffc90000a64710 ffffffffffffff10 
ffffffff81195301
[ 3592.832003] <0> 0000000000000010 0000000000010212 ffff880001803d18 
0000000000000018
[ 3592.832003] Call Trace:
[ 3592.832003]  <IRQ>
[ 3592.832003]  [<ffffffff81195301>] ? __memcpy_toio+0x9/0x19
[ 3592.832003]  [<ffffffffa01164ed>] ? sym_interrupt+0x46c/0x6a3 [sym53c8xx]
[ 3592.832003]  [<ffffffff8103fea0>] ? update_curr+0xa6/0x147
[ 3592.832003]  [<ffffffffa010fbde>] ? sym53c8xx_intr+0x43/0x6a [sym53c8xx]
[ 3592.832003]  [<ffffffff81093bfc>] ? handle_IRQ_event+0x58/0x126
[ 3592.832003]  [<ffffffff810954e2>] ? handle_fasteoi_irq+0x7d/0xb5
[ 3592.832003]  [<ffffffff81013957>] ? handle_irq+0x17/0x1d
[ 3592.832003]  [<ffffffff81012fb1>] ? do_IRQ+0x57/0xb6
[ 3592.832003]  [<ffffffff810114d3>] ? ret_from_intr+0x0/0x11
[ 3592.832003]  [<ffffffff81053903>] ? __do_softirq+0x6e/0x19f
[ 3592.832003]  [<ffffffff8106fa87>] ? tick_dev_program_event+0x2d/0x95
[ 3592.832003]  [<ffffffff81011cac>] ? call_softirq+0x1c/0x30
[ 3592.832003]  [<ffffffff81013903>] ? do_softirq+0x3f/0x7c
[ 3592.832003]  [<ffffffff810537e1>] ? irq_exit+0x36/0x76
[ 3592.832003]  [<ffffffff81025837>] ? smp_apic_timer_interrupt+0x87/0x95
[ 3592.832003]  [<ffffffff81011673>] ? apic_timer_interrupt+0x13/0x20
[ 3592.832003]  <EOI>
[ 3592.832003]  [<ffffffff8118e009>] ? delay_tsc+0x0/0x73
[ 3592.832003]  [<ffffffffa010f900>] ? sym_eh_handler+0x22e/0x2e2 [sym53c8xx]
[ 3592.832003]  [<ffffffffa008e5de>] ? scsi_try_bus_reset+0x50/0xd9 [scsi_mod]
[ 3592.832003]  [<ffffffffa008f565>] ? scsi_eh_ready_devs+0x50c/0x781 [scsi_mod]
[ 3592.832003]  [<ffffffffa008fd6b>] ? scsi_error_handler+0x3c1/0x5b5 [scsi_mod]
[ 3592.832003]  [<ffffffffa008f9aa>] ? scsi_error_handler+0x0/0x5b5 [scsi_mod]
[ 3592.832003]  [<ffffffff81064789>] ? kthread+0x79/0x81
[ 3592.832003]  [<ffffffff81011baa>] ? child_rip+0xa/0x20
[ 3592.832003]  [<ffffffff81064710>] ? kthread+0x0/0x81
[ 3592.832003]  [<ffffffff81011ba0>] ? child_rip+0x0/0x20
[ 3592.832003] Code: 48 c7 c7 82 92 11 a0 eb 63 48 8b 98 38 01 00 00 48 8d b8 
28 01 00 00 e8 df d5 0f e1 48 89 da 48 89 c6 48 c7 c7 bc 92 11 a0 eb 6d <49> 8b 
96 58 03 00 00 48 8b 82 80 00 00 00 48 8b a8 b0 00 00 00
[ 3592.832003] RIP  [<ffffffffa01147c4>] sym_int_sir+0x62f/0x14e0 [sym53c8xx]
[ 3592.832003]  RSP <ffff880001803cb0>
[ 3592.832003] CR2: 0000000000000358
[ 3592.867935] ---[ end trace 06f90ebbdbd172ee ]---
[ 3592.868360] Kernel panic - not syncing: Fatal exception in interrupt
[ 3592.868906] Pid: 193, comm: scsi_eh_0 Tainted: P      D    2.6.32-3-amd64 #1
[ 3592.869511] Call Trace:
[ 3592.869727]  <IRQ>  [<ffffffff812ed349>] ? panic+0x86/0x141
[ 3592.870225]  [<ffffffff81011673>] ? apic_timer_interrupt+0x13/0x20
[ 3592.870778]  [<ffffffff811afbdc>] ? dummycon_dummy+0x0/0x3
[ 3592.871250]  [<ffffffff81014a37>] ? oops_end+0x64/0xb4
[ 3592.871694]  [<ffffffff81014a7a>] ? oops_end+0xa7/0xb4
[ 3592.872150]  [<ffffffff810322b8>] ? no_context+0x1e9/0x1f8
[ 3592.872626]  [<ffffffff8103246d>] ? __bad_area_nosemaphore+0x1a6/0x1ca
[ 3592.873185]  [<ffffffff8106807c>] ? up+0xe/0x36
[ 3592.873576]  [<ffffffff8104e219>] ? release_console_sem+0x17e/0x1af
[ 3592.874125]  [<ffffffff81024d72>] ? lapic_next_event+0x18/0x1d
[ 3592.874642]  [<ffffffff812ef595>] ? page_fault+0x25/0x30
[ 3592.875103]  [<ffffffffa01147c4>] ? sym_int_sir+0x62f/0x14e0 [sym53c8xx]
[ 3592.875678]  [<ffffffff81195301>] ? __memcpy_toio+0x9/0x19
[ 3592.876162]  [<ffffffffa01164ed>] ? sym_interrupt+0x46c/0x6a3 [sym53c8xx]
[ 3592.876748]  [<ffffffff8103fea0>] ? update_curr+0xa6/0x147
[ 3592.877224]  [<ffffffffa010fbde>] ? sym53c8xx_intr+0x43/0x6a [sym53c8xx]
[ 3592.877800]  [<ffffffff81093bfc>] ? handle_IRQ_event+0x58/0x126
[ 3592.878319]  [<ffffffff810954e2>] ? handle_fasteoi_irq+0x7d/0xb5
[ 3592.878848]  [<ffffffff81013957>] ? handle_irq+0x17/0x1d
[ 3592.879305]  [<ffffffff81012fb1>] ? do_IRQ+0x57/0xb6
[ 3592.879744]  [<ffffffff810114d3>] ? ret_from_intr+0x0/0x11
[ 3592.880237]  [<ffffffff81053903>] ? __do_softirq+0x6e/0x19f
[ 3592.880723]  [<ffffffff8106fa87>] ? tick_dev_program_event+0x2d/0x95
[ 3592.881284]  [<ffffffff81011cac>] ? call_softirq+0x1c/0x30
[ 3592.881762]  [<ffffffff81013903>] ? do_softirq+0x3f/0x7c
[ 3592.882230]  [<ffffffff810537e1>] ? irq_exit+0x36/0x76
[ 3592.882691]  [<ffffffff81025837>] ? smp_apic_timer_interrupt+0x87/0x95
[ 3592.883258]  [<ffffffff81011673>] ? apic_timer_interrupt+0x13/0x20
[ 3592.883795]  <EOI>  [<ffffffff8118e009>] ? delay_tsc+0x0/0x73
[ 3592.884319]  [<ffffffffa010f900>] ? sym_eh_handler+0x22e/0x2e2 [sym53c8xx]
[ 3592.884917]  [<ffffffffa008e5de>] ? scsi_try_bus_reset+0x50/0xd9 [scsi_mod]
[ 3592.885522]  [<ffffffffa008f565>] ? scsi_eh_ready_devs+0x50c/0x781 [scsi_mod]
[ 3592.886152]  [<ffffffffa008fd6b>] ? scsi_error_handler+0x3c1/0x5b5 [scsi_mod]
[ 3592.886789]  [<ffffffffa008f9aa>] ? scsi_error_handler+0x0/0x5b5 [scsi_mod]
[ 3592.887398]  [<ffffffff81064789>] ? kthread+0x79/0x81
[ 3592.887836]  [<ffffffff81011baa>] ? child_rip+0xa/0x20
[ 3592.888290]  [<ffffffff81064710>] ? kthread+0x0/0x81
[ 3592.888721]  [<ffffffff81011ba0>] ? child_rip+0x0/0x20

Unfortunatelly I have no idea how to reproduce the problem.

Log from /var/log/libvirt/qemu/
lsi_scsi: error: Unimplemented message 0x0c

What is more I had 7 vm running. Today four of them crashed at the same time. 
The rest survived with something like this in syslog:

[651330.816043] sd 0:0:0:0: [sda] ABORT operation started
[651335.860027] sd 0:0:0:0: ABORT operation timed-out.
[651335.860600] sd 0:0:0:0: [sda] ABORT operation started
[651337.019355] sd 0:0:0:0: ABORT operation complete.
[651337.038506] sd 0:0:0:0: [sda] ABORT operation started
[651337.039100] sd 0:0:0:0: ABORT operation failed.
[651337.039624] sd 0:0:0:0: [sda] ABORT operation started
[651337.040303] sd 0:0:0:0: ABORT operation failed.
[651337.040834] sd 0:0:0:0: [sda] ABORT operation started
[651337.041417] sd 0:0:0:0: ABORT operation failed.
[651337.041949] sd 0:0:0:0: [sda] ABORT operation started
[651337.042534] sd 0:0:0:0: ABORT operation failed.
[651337.043072] sd 0:0:0:0: [sda] DEVICE RESET operation started
[651337.043834] scsi target0:0:0: control msgout: c.
[651337.520075] scsi target0:0:0: has been reset
[651337.521726] sd 0:0:0:0: DEVICE RESET operation complete.
[651337.522495] sd 0:0:0:0: M_REJECT received (0:0).

It looks like the problem is in host system and has influence on all machines 
at the same time. I have found the same pattern in syslog on machines which 
crashed. It was 3 days before crash. There is no information in host log files 
at all. Is this possible that eucalyptus (1.6.2) caused this? With 1.6.1 I 
didin't have these problems. Eucalyptus runs kvm (0.12 and 0.11) with commands:

/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin HOME=/root 
USER=root LOGNAME=root /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 512 -smp 
1,sockets=1,cores=1,threads=1 -name i-35B80630 -uuid 
7e9b2fc1-9a9d-7114-3cb4-f4fdb3d51a3a -nographic -nodefaults -chardev 
socket,id=monitor,path=/var/lib/libvirt/qemu/i-35B80630.monitor,server,nowait 
-mon chardev=monitor,mode=readline -rtc base=utc -boot c -kernel 
/var/lib/eucalyptus/instances/winnie/i-35B80630/kernel -initrd 
/var/lib/eucalyptus/instances/winnie/i-35B80630/ramdisk -append root=/dev/sda1 
console=ttyS0 -device lsi,id=scsi0,bus=pci.0,addr=0x5 -drive 
file=/var/lib/eucalyptus/instances/winnie/i-35B80630/disk,if=none,id=drive-scsi0-0-0,boot=on
 -device scsi-disk,bus=scsi0.0,scsi-id=0,drive=drive-scsi0-0-0,id=scsi0-0-0 
-device e1000,vlan=0,id=net0,mac=d0:0d:35:b8:06:30,bus=pci.0,addr=0x4 -net 
tap,fd=43,vlan=0,name=hostnet0 -chardev 
file,id=serial0,path=/var/lib/eucalyptus/instances/winnie/i-35B80630/console.log
 -device isa-serial,chardev=serial0 -usb -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 

/usr/bin/kvm -S -M pc-0.11 -enable-kvm -m 512 -smp 1 -name i-492407F3 -uuid 
b2dc266e-a62a-4e13-3847-f9104eba4135 -nographic -monitor 
unix:/var/lib/libvirt/qemu/i-492407F3.monitor,server,nowait -boot c -kernel 
/var/lib/eucalyptus/instances/admin/i-492407F3/kernel -initrd 
/var/lib/eucalyptus/instances/admin/i-492407F3/ramdisk -append root=/dev/sda1 
console=ttyS0 -drive 
file=/var/lib/eucalyptus/instances/admin/i-492407F3/disk,if=scsi,bus=0,unit=0,boot=on
 -net nic,macaddr=d0:0d:49:24:07:f3,vlan=0,model=e1000,name=net0 -net 
tap,fd=118,vlan=0,name=hostnet0 -serial 
file:/var/lib/eucalyptus/instances/admin/i-492407F3/console.log -parallel none 
-usb -vga none -balloon virtio 

I can give the access to vm.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]