[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] [Bug 587993] Re: qemu-kvm 0.12.4+dfsg-1 from debian squeeze
From: |
Maciek |
Subject: |
[Qemu-devel] [Bug 587993] Re: qemu-kvm 0.12.4+dfsg-1 from debian squeeze crashes "BUG: unable to handle kernel NULL pointer" (sym53c8xx) |
Date: |
Fri, 04 Jun 2010 12:59:05 -0000 |
** Description changed:
I use eucalyptus software (1.6.2) on debian squeeze with kvm
- 0.12.4+dfsg-1. Kernel 2.6.32-3-amd64. After a few days machines crash.
- There are no logs in host system. Guest is the same kernel and OS as
- host. The kvm process use 100% of cpu time. I can not even ping the
- guest. Here is the log from virtual machine:
+ 0.12.4+dfsg-1 (the same happend with 0.11.1+dfsg-1 ). Kernel
+ 2.6.32-3-amd64. After a few days machines crash. There are no logs in
+ host system. Guest is the same kernel and OS as host. The kvm process
+ use 100% of cpu time. I can not even ping the guest. Here is the log
+ from virtual machine:
[ 3577.816666] sd 0:0:0:0: [sda] ABORT operation started
[ 3582.816047] sd 0:0:0:0: ABORT operation timed-out.
[ 3582.816781] sd 0:0:0:0: [sda] ABORT operation started
[ 3587.816649] sd 0:0:0:0: ABORT operation timed-out.
[ 3587.817379] sd 0:0:0:0: [sda] DEVICE RESET operation started
[ 3592.816062] sd 0:0:0:0: DEVICE RESET operation timed-out.
[ 3592.816882] sd 0:0:0:0: [sda] BUS RESET operation started
[ 3592.820056] sym0: SCSI BUS reset detected.
[ 3592.831538] sym0: SCSI BUS has been reset.
[ 3592.831968] BUG: unable to handle kernel NULL pointer dereference at
0000000000000358
[ 3592.832003] IP: [<ffffffffa01147c4>] sym_int_sir+0x62f/0x14e0 [sym53c8xx]
- [ 3592.832003] PGD 5f73e067 PUD 5fa53067 PMD 0
- [ 3592.832003] Oops: 0000 [#1] SMP
+ [ 3592.832003] PGD 5f73e067 PUD 5fa53067 PMD 0
+ [ 3592.832003] Oops: 0000 [#1] SMP
[ 3592.832003] last sysfs file:
/sys/devices/pci0000:00/0000:00:05.0/host0/target0:0:0/0:0:0:0/vendor
- [ 3592.832003] CPU 0
+ [ 3592.832003] CPU 0
[ 3592.832003] Modules linked in: dm_mod openafs(P) ext2 snd_pcsp snd_pcm
snd_timer serio_raw i2c_piix4 snd virtio_balloon evdev i2c_core soundcore
psmouse button processor snd_page_alloc ext3 jbd mbcache sd_mod crc_t10dif
ata_generic libata ide_pci_generic sym53c8xx scsi_transport_spi thermal piix
uhci_hcd ehci_hcd floppy thermal_sys scsi_mod virtio_pci virtio_ring virtio
e1000 ide_core usbcore nls_base [last unloaded: scsi_wait_scan]
[ 3592.832003] Pid: 193, comm: scsi_eh_0 Tainted: P 2.6.32-3-amd64
#1 Bochs
[ 3592.832003] RIP: 0010:[<ffffffffa01147c4>] [<ffffffffa01147c4>]
sym_int_sir+0x62f/0x14e0 [sym53c8xx]
[ 3592.832003] RSP: 0018:ffff880001803cb0 EFLAGS: 00010287
[ 3592.832003] RAX: 000000000000000a RBX: 000000000000000b RCX:
000000005f410090
[ 3592.832003] RDX: 0000000000000000 RSI: ffff88005c450800 RDI:
ffffc90000a5e006
[ 3592.832003] RBP: ffff88005f410000 R08: 0000000000000000 R09:
0000000000000000
[ 3592.832003] R10: 000000000000003a R11: ffffffff813b871e R12:
ffff88005f410090
[ 3592.832003] R13: 0000000000000084 R14: 0000000000000000 R15:
0000000000000001
[ 3592.832003] FS: 0000000000000000(0000) GS:ffff880001800000(0000)
knlGS:0000000000000000
[ 3592.832003] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 3592.832003] CR2: 0000000000000358 CR3: 000000005e269000 CR4:
00000000000006f0
[ 3592.832003] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 3592.832003] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[ 3592.832003] Process scsi_eh_0 (pid: 193, threadinfo ffff88005f6fa000, task
ffff88005f697880)
[ 3592.832003] Stack:
[ 3592.832003] ffff88005f3fd000 0000000000000000 0000000000000130
0000000000000000
[ 3592.832003] <0> ffff88005f407710 ffffc90000a64710 ffffffffffffff10
ffffffff81195301
[ 3592.832003] <0> 0000000000000010 0000000000010212 ffff880001803d18
0000000000000018
[ 3592.832003] Call Trace:
- [ 3592.832003] <IRQ>
+ [ 3592.832003] <IRQ>
[ 3592.832003] [<ffffffff81195301>] ? __memcpy_toio+0x9/0x19
[ 3592.832003] [<ffffffffa01164ed>] ? sym_interrupt+0x46c/0x6a3 [sym53c8xx]
[ 3592.832003] [<ffffffff8103fea0>] ? update_curr+0xa6/0x147
[ 3592.832003] [<ffffffffa010fbde>] ? sym53c8xx_intr+0x43/0x6a [sym53c8xx]
[ 3592.832003] [<ffffffff81093bfc>] ? handle_IRQ_event+0x58/0x126
[ 3592.832003] [<ffffffff810954e2>] ? handle_fasteoi_irq+0x7d/0xb5
[ 3592.832003] [<ffffffff81013957>] ? handle_irq+0x17/0x1d
[ 3592.832003] [<ffffffff81012fb1>] ? do_IRQ+0x57/0xb6
[ 3592.832003] [<ffffffff810114d3>] ? ret_from_intr+0x0/0x11
[ 3592.832003] [<ffffffff81053903>] ? __do_softirq+0x6e/0x19f
[ 3592.832003] [<ffffffff8106fa87>] ? tick_dev_program_event+0x2d/0x95
[ 3592.832003] [<ffffffff81011cac>] ? call_softirq+0x1c/0x30
[ 3592.832003] [<ffffffff81013903>] ? do_softirq+0x3f/0x7c
[ 3592.832003] [<ffffffff810537e1>] ? irq_exit+0x36/0x76
[ 3592.832003] [<ffffffff81025837>] ? smp_apic_timer_interrupt+0x87/0x95
[ 3592.832003] [<ffffffff81011673>] ? apic_timer_interrupt+0x13/0x20
- [ 3592.832003] <EOI>
+ [ 3592.832003] <EOI>
[ 3592.832003] [<ffffffff8118e009>] ? delay_tsc+0x0/0x73
[ 3592.832003] [<ffffffffa010f900>] ? sym_eh_handler+0x22e/0x2e2 [sym53c8xx]
[ 3592.832003] [<ffffffffa008e5de>] ? scsi_try_bus_reset+0x50/0xd9 [scsi_mod]
[ 3592.832003] [<ffffffffa008f565>] ? scsi_eh_ready_devs+0x50c/0x781
[scsi_mod]
[ 3592.832003] [<ffffffffa008fd6b>] ? scsi_error_handler+0x3c1/0x5b5
[scsi_mod]
[ 3592.832003] [<ffffffffa008f9aa>] ? scsi_error_handler+0x0/0x5b5 [scsi_mod]
[ 3592.832003] [<ffffffff81064789>] ? kthread+0x79/0x81
[ 3592.832003] [<ffffffff81011baa>] ? child_rip+0xa/0x20
[ 3592.832003] [<ffffffff81064710>] ? kthread+0x0/0x81
[ 3592.832003] [<ffffffff81011ba0>] ? child_rip+0x0/0x20
- [ 3592.832003] Code: 48 c7 c7 82 92 11 a0 eb 63 48 8b 98 38 01 00 00 48 8d b8
28 01 00 00 e8 df d5 0f e1 48 89 da 48 89 c6 48 c7 c7 bc 92 11 a0 eb 6d <49> 8b
96 58 03 00 00 48 8b 82 80 00 00 00 48 8b a8 b0 00 00 00
+ [ 3592.832003] Code: 48 c7 c7 82 92 11 a0 eb 63 48 8b 98 38 01 00 00 48 8d b8
28 01 00 00 e8 df d5 0f e1 48 89 da 48 89 c6 48 c7 c7 bc 92 11 a0 eb 6d <49> 8b
96 58 03 00 00 48 8b 82 80 00 00 00 48 8b a8 b0 00 00 00
[ 3592.832003] RIP [<ffffffffa01147c4>] sym_int_sir+0x62f/0x14e0 [sym53c8xx]
[ 3592.832003] RSP <ffff880001803cb0>
[ 3592.832003] CR2: 0000000000000358
[ 3592.867935] ---[ end trace 06f90ebbdbd172ee ]---
[ 3592.868360] Kernel panic - not syncing: Fatal exception in interrupt
[ 3592.868906] Pid: 193, comm: scsi_eh_0 Tainted: P D 2.6.32-3-amd64
#1
[ 3592.869511] Call Trace:
[ 3592.869727] <IRQ> [<ffffffff812ed349>] ? panic+0x86/0x141
[ 3592.870225] [<ffffffff81011673>] ? apic_timer_interrupt+0x13/0x20
[ 3592.870778] [<ffffffff811afbdc>] ? dummycon_dummy+0x0/0x3
[ 3592.871250] [<ffffffff81014a37>] ? oops_end+0x64/0xb4
[ 3592.871694] [<ffffffff81014a7a>] ? oops_end+0xa7/0xb4
[ 3592.872150] [<ffffffff810322b8>] ? no_context+0x1e9/0x1f8
[ 3592.872626] [<ffffffff8103246d>] ? __bad_area_nosemaphore+0x1a6/0x1ca
[ 3592.873185] [<ffffffff8106807c>] ? up+0xe/0x36
[ 3592.873576] [<ffffffff8104e219>] ? release_console_sem+0x17e/0x1af
[ 3592.874125] [<ffffffff81024d72>] ? lapic_next_event+0x18/0x1d
[ 3592.874642] [<ffffffff812ef595>] ? page_fault+0x25/0x30
[ 3592.875103] [<ffffffffa01147c4>] ? sym_int_sir+0x62f/0x14e0 [sym53c8xx]
[ 3592.875678] [<ffffffff81195301>] ? __memcpy_toio+0x9/0x19
[ 3592.876162] [<ffffffffa01164ed>] ? sym_interrupt+0x46c/0x6a3 [sym53c8xx]
[ 3592.876748] [<ffffffff8103fea0>] ? update_curr+0xa6/0x147
[ 3592.877224] [<ffffffffa010fbde>] ? sym53c8xx_intr+0x43/0x6a [sym53c8xx]
[ 3592.877800] [<ffffffff81093bfc>] ? handle_IRQ_event+0x58/0x126
[ 3592.878319] [<ffffffff810954e2>] ? handle_fasteoi_irq+0x7d/0xb5
[ 3592.878848] [<ffffffff81013957>] ? handle_irq+0x17/0x1d
[ 3592.879305] [<ffffffff81012fb1>] ? do_IRQ+0x57/0xb6
[ 3592.879744] [<ffffffff810114d3>] ? ret_from_intr+0x0/0x11
[ 3592.880237] [<ffffffff81053903>] ? __do_softirq+0x6e/0x19f
[ 3592.880723] [<ffffffff8106fa87>] ? tick_dev_program_event+0x2d/0x95
[ 3592.881284] [<ffffffff81011cac>] ? call_softirq+0x1c/0x30
[ 3592.881762] [<ffffffff81013903>] ? do_softirq+0x3f/0x7c
[ 3592.882230] [<ffffffff810537e1>] ? irq_exit+0x36/0x76
[ 3592.882691] [<ffffffff81025837>] ? smp_apic_timer_interrupt+0x87/0x95
[ 3592.883258] [<ffffffff81011673>] ? apic_timer_interrupt+0x13/0x20
[ 3592.883795] <EOI> [<ffffffff8118e009>] ? delay_tsc+0x0/0x73
[ 3592.884319] [<ffffffffa010f900>] ? sym_eh_handler+0x22e/0x2e2 [sym53c8xx]
[ 3592.884917] [<ffffffffa008e5de>] ? scsi_try_bus_reset+0x50/0xd9 [scsi_mod]
[ 3592.885522] [<ffffffffa008f565>] ? scsi_eh_ready_devs+0x50c/0x781
[scsi_mod]
[ 3592.886152] [<ffffffffa008fd6b>] ? scsi_error_handler+0x3c1/0x5b5
[scsi_mod]
[ 3592.886789] [<ffffffffa008f9aa>] ? scsi_error_handler+0x0/0x5b5 [scsi_mod]
[ 3592.887398] [<ffffffff81064789>] ? kthread+0x79/0x81
[ 3592.887836] [<ffffffff81011baa>] ? child_rip+0xa/0x20
[ 3592.888290] [<ffffffff81064710>] ? kthread+0x0/0x81
[ 3592.888721] [<ffffffff81011ba0>] ? child_rip+0x0/0x20
Unfortunatelly I have no idea how to reproduce the problem.
+
+ Log from /var/log/libvirt/qemu/
+ lsi_scsi: error: Unimplemented message 0x0c
+
+ What is more I had 7 vm running. Today four of them crashed at the same
+ time. The rest survived with something like this in syslog:
+
+ [651330.816043] sd 0:0:0:0: [sda] ABORT operation started
+ [651335.860027] sd 0:0:0:0: ABORT operation timed-out.
+ [651335.860600] sd 0:0:0:0: [sda] ABORT operation started
+ [651337.019355] sd 0:0:0:0: ABORT operation complete.
+ [651337.038506] sd 0:0:0:0: [sda] ABORT operation started
+ [651337.039100] sd 0:0:0:0: ABORT operation failed.
+ [651337.039624] sd 0:0:0:0: [sda] ABORT operation started
+ [651337.040303] sd 0:0:0:0: ABORT operation failed.
+ [651337.040834] sd 0:0:0:0: [sda] ABORT operation started
+ [651337.041417] sd 0:0:0:0: ABORT operation failed.
+ [651337.041949] sd 0:0:0:0: [sda] ABORT operation started
+ [651337.042534] sd 0:0:0:0: ABORT operation failed.
+ [651337.043072] sd 0:0:0:0: [sda] DEVICE RESET operation started
+ [651337.043834] scsi target0:0:0: control msgout: c.
+ [651337.520075] scsi target0:0:0: has been reset
+ [651337.521726] sd 0:0:0:0: DEVICE RESET operation complete.
+ [651337.522495] sd 0:0:0:0: M_REJECT received (0:0).
+
+ It looks like the problem is in host system and has influence on all
+ machines at the same time. I have found the same pattern in syslog on
+ machines which crashed. It was 3 days before crash. There is no
+ information in host log files at all. Is this possible that eucalyptus
+ (1.6.2) caused this? With 1.6.1 I didin't have these problems.
+ Eucalyptus runs kvm (0.12 and 0.11) with commands:
+
+ /usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin HOME=/root
+ USER=root LOGNAME=root /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 512
+ -smp 1,sockets=1,cores=1,threads=1 -name i-35B80630 -uuid
+ 7e9b2fc1-9a9d-7114-3cb4-f4fdb3d51a3a -nographic -nodefaults -chardev
+ socket,id=monitor,path=/var/lib/libvirt/qemu/i-35B80630.monitor,server,nowait
+ -mon chardev=monitor,mode=readline -rtc base=utc -boot c -kernel
+ /var/lib/eucalyptus/instances/winnie/i-35B80630/kernel -initrd
+ /var/lib/eucalyptus/instances/winnie/i-35B80630/ramdisk -append
+ root=/dev/sda1 console=ttyS0 -device lsi,id=scsi0,bus=pci.0,addr=0x5
+ -drive
+ file=/var/lib/eucalyptus/instances/winnie/i-35B80630/disk,if=none,id
+ =drive-scsi0-0-0,boot=on -device scsi-disk,bus=scsi0.0,scsi-id=0,drive
+ =drive-scsi0-0-0,id=scsi0-0-0 -device
+ e1000,vlan=0,id=net0,mac=d0:0d:35:b8:06:30,bus=pci.0,addr=0x4 -net
+ tap,fd=43,vlan=0,name=hostnet0 -chardev
+
file,id=serial0,path=/var/lib/eucalyptus/instances/winnie/i-35B80630/console.log
+ -device isa-serial,chardev=serial0 -usb -device virtio-balloon-
+ pci,id=balloon0,bus=pci.0,addr=0x3
+
+ /usr/bin/kvm -S -M pc-0.11 -enable-kvm -m 512 -smp 1 -name i-492407F3
+ -uuid b2dc266e-a62a-4e13-3847-f9104eba4135 -nographic -monitor
+ unix:/var/lib/libvirt/qemu/i-492407F3.monitor,server,nowait -boot c
+ -kernel /var/lib/eucalyptus/instances/admin/i-492407F3/kernel -initrd
+ /var/lib/eucalyptus/instances/admin/i-492407F3/ramdisk -append
+ root=/dev/sda1 console=ttyS0 -drive
+
file=/var/lib/eucalyptus/instances/admin/i-492407F3/disk,if=scsi,bus=0,unit=0,boot=on
+ -net nic,macaddr=d0:0d:49:24:07:f3,vlan=0,model=e1000,name=net0 -net
+ tap,fd=118,vlan=0,name=hostnet0 -serial
+ file:/var/lib/eucalyptus/instances/admin/i-492407F3/console.log
+ -parallel none -usb -vga none -balloon virtio
+
+ I can give the access to vm.
--
qemu-kvm 0.12.4+dfsg-1 from debian squeeze crashes "BUG: unable to handle
kernel NULL pointer" (sym53c8xx)
https://bugs.launchpad.net/bugs/587993
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
Status in QEMU: Incomplete
Bug description:
I use eucalyptus software (1.6.2) on debian squeeze with kvm 0.12.4+dfsg-1 (the
same happend with 0.11.1+dfsg-1 ). Kernel 2.6.32-3-amd64. After a few days
machines crash. There are no logs in host system. Guest is the same kernel and
OS as host. The kvm process use 100% of cpu time. I can not even ping the
guest. Here is the log from virtual machine:
[ 3577.816666] sd 0:0:0:0: [sda] ABORT operation started
[ 3582.816047] sd 0:0:0:0: ABORT operation timed-out.
[ 3582.816781] sd 0:0:0:0: [sda] ABORT operation started
[ 3587.816649] sd 0:0:0:0: ABORT operation timed-out.
[ 3587.817379] sd 0:0:0:0: [sda] DEVICE RESET operation started
[ 3592.816062] sd 0:0:0:0: DEVICE RESET operation timed-out.
[ 3592.816882] sd 0:0:0:0: [sda] BUS RESET operation started
[ 3592.820056] sym0: SCSI BUS reset detected.
[ 3592.831538] sym0: SCSI BUS has been reset.
[ 3592.831968] BUG: unable to handle kernel NULL pointer dereference at
0000000000000358
[ 3592.832003] IP: [<ffffffffa01147c4>] sym_int_sir+0x62f/0x14e0 [sym53c8xx]
[ 3592.832003] PGD 5f73e067 PUD 5fa53067 PMD 0
[ 3592.832003] Oops: 0000 [#1] SMP
[ 3592.832003] last sysfs file:
/sys/devices/pci0000:00/0000:00:05.0/host0/target0:0:0/0:0:0:0/vendor
[ 3592.832003] CPU 0
[ 3592.832003] Modules linked in: dm_mod openafs(P) ext2 snd_pcsp snd_pcm
snd_timer serio_raw i2c_piix4 snd virtio_balloon evdev i2c_core soundcore
psmouse button processor snd_page_alloc ext3 jbd mbcache sd_mod crc_t10dif
ata_generic libata ide_pci_generic sym53c8xx scsi_transport_spi thermal piix
uhci_hcd ehci_hcd floppy thermal_sys scsi_mod virtio_pci virtio_ring virtio
e1000 ide_core usbcore nls_base [last unloaded: scsi_wait_scan]
[ 3592.832003] Pid: 193, comm: scsi_eh_0 Tainted: P 2.6.32-3-amd64 #1
Bochs
[ 3592.832003] RIP: 0010:[<ffffffffa01147c4>] [<ffffffffa01147c4>]
sym_int_sir+0x62f/0x14e0 [sym53c8xx]
[ 3592.832003] RSP: 0018:ffff880001803cb0 EFLAGS: 00010287
[ 3592.832003] RAX: 000000000000000a RBX: 000000000000000b RCX: 000000005f410090
[ 3592.832003] RDX: 0000000000000000 RSI: ffff88005c450800 RDI: ffffc90000a5e006
[ 3592.832003] RBP: ffff88005f410000 R08: 0000000000000000 R09: 0000000000000000
[ 3592.832003] R10: 000000000000003a R11: ffffffff813b871e R12: ffff88005f410090
[ 3592.832003] R13: 0000000000000084 R14: 0000000000000000 R15: 0000000000000001
[ 3592.832003] FS: 0000000000000000(0000) GS:ffff880001800000(0000)
knlGS:0000000000000000
[ 3592.832003] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 3592.832003] CR2: 0000000000000358 CR3: 000000005e269000 CR4: 00000000000006f0
[ 3592.832003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3592.832003] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 3592.832003] Process scsi_eh_0 (pid: 193, threadinfo ffff88005f6fa000, task
ffff88005f697880)
[ 3592.832003] Stack:
[ 3592.832003] ffff88005f3fd000 0000000000000000 0000000000000130
0000000000000000
[ 3592.832003] <0> ffff88005f407710 ffffc90000a64710 ffffffffffffff10
ffffffff81195301
[ 3592.832003] <0> 0000000000000010 0000000000010212 ffff880001803d18
0000000000000018
[ 3592.832003] Call Trace:
[ 3592.832003] <IRQ>
[ 3592.832003] [<ffffffff81195301>] ? __memcpy_toio+0x9/0x19
[ 3592.832003] [<ffffffffa01164ed>] ? sym_interrupt+0x46c/0x6a3 [sym53c8xx]
[ 3592.832003] [<ffffffff8103fea0>] ? update_curr+0xa6/0x147
[ 3592.832003] [<ffffffffa010fbde>] ? sym53c8xx_intr+0x43/0x6a [sym53c8xx]
[ 3592.832003] [<ffffffff81093bfc>] ? handle_IRQ_event+0x58/0x126
[ 3592.832003] [<ffffffff810954e2>] ? handle_fasteoi_irq+0x7d/0xb5
[ 3592.832003] [<ffffffff81013957>] ? handle_irq+0x17/0x1d
[ 3592.832003] [<ffffffff81012fb1>] ? do_IRQ+0x57/0xb6
[ 3592.832003] [<ffffffff810114d3>] ? ret_from_intr+0x0/0x11
[ 3592.832003] [<ffffffff81053903>] ? __do_softirq+0x6e/0x19f
[ 3592.832003] [<ffffffff8106fa87>] ? tick_dev_program_event+0x2d/0x95
[ 3592.832003] [<ffffffff81011cac>] ? call_softirq+0x1c/0x30
[ 3592.832003] [<ffffffff81013903>] ? do_softirq+0x3f/0x7c
[ 3592.832003] [<ffffffff810537e1>] ? irq_exit+0x36/0x76
[ 3592.832003] [<ffffffff81025837>] ? smp_apic_timer_interrupt+0x87/0x95
[ 3592.832003] [<ffffffff81011673>] ? apic_timer_interrupt+0x13/0x20
[ 3592.832003] <EOI>
[ 3592.832003] [<ffffffff8118e009>] ? delay_tsc+0x0/0x73
[ 3592.832003] [<ffffffffa010f900>] ? sym_eh_handler+0x22e/0x2e2 [sym53c8xx]
[ 3592.832003] [<ffffffffa008e5de>] ? scsi_try_bus_reset+0x50/0xd9 [scsi_mod]
[ 3592.832003] [<ffffffffa008f565>] ? scsi_eh_ready_devs+0x50c/0x781 [scsi_mod]
[ 3592.832003] [<ffffffffa008fd6b>] ? scsi_error_handler+0x3c1/0x5b5 [scsi_mod]
[ 3592.832003] [<ffffffffa008f9aa>] ? scsi_error_handler+0x0/0x5b5 [scsi_mod]
[ 3592.832003] [<ffffffff81064789>] ? kthread+0x79/0x81
[ 3592.832003] [<ffffffff81011baa>] ? child_rip+0xa/0x20
[ 3592.832003] [<ffffffff81064710>] ? kthread+0x0/0x81
[ 3592.832003] [<ffffffff81011ba0>] ? child_rip+0x0/0x20
[ 3592.832003] Code: 48 c7 c7 82 92 11 a0 eb 63 48 8b 98 38 01 00 00 48 8d b8
28 01 00 00 e8 df d5 0f e1 48 89 da 48 89 c6 48 c7 c7 bc 92 11 a0 eb 6d <49> 8b
96 58 03 00 00 48 8b 82 80 00 00 00 48 8b a8 b0 00 00 00
[ 3592.832003] RIP [<ffffffffa01147c4>] sym_int_sir+0x62f/0x14e0 [sym53c8xx]
[ 3592.832003] RSP <ffff880001803cb0>
[ 3592.832003] CR2: 0000000000000358
[ 3592.867935] ---[ end trace 06f90ebbdbd172ee ]---
[ 3592.868360] Kernel panic - not syncing: Fatal exception in interrupt
[ 3592.868906] Pid: 193, comm: scsi_eh_0 Tainted: P D 2.6.32-3-amd64 #1
[ 3592.869511] Call Trace:
[ 3592.869727] <IRQ> [<ffffffff812ed349>] ? panic+0x86/0x141
[ 3592.870225] [<ffffffff81011673>] ? apic_timer_interrupt+0x13/0x20
[ 3592.870778] [<ffffffff811afbdc>] ? dummycon_dummy+0x0/0x3
[ 3592.871250] [<ffffffff81014a37>] ? oops_end+0x64/0xb4
[ 3592.871694] [<ffffffff81014a7a>] ? oops_end+0xa7/0xb4
[ 3592.872150] [<ffffffff810322b8>] ? no_context+0x1e9/0x1f8
[ 3592.872626] [<ffffffff8103246d>] ? __bad_area_nosemaphore+0x1a6/0x1ca
[ 3592.873185] [<ffffffff8106807c>] ? up+0xe/0x36
[ 3592.873576] [<ffffffff8104e219>] ? release_console_sem+0x17e/0x1af
[ 3592.874125] [<ffffffff81024d72>] ? lapic_next_event+0x18/0x1d
[ 3592.874642] [<ffffffff812ef595>] ? page_fault+0x25/0x30
[ 3592.875103] [<ffffffffa01147c4>] ? sym_int_sir+0x62f/0x14e0 [sym53c8xx]
[ 3592.875678] [<ffffffff81195301>] ? __memcpy_toio+0x9/0x19
[ 3592.876162] [<ffffffffa01164ed>] ? sym_interrupt+0x46c/0x6a3 [sym53c8xx]
[ 3592.876748] [<ffffffff8103fea0>] ? update_curr+0xa6/0x147
[ 3592.877224] [<ffffffffa010fbde>] ? sym53c8xx_intr+0x43/0x6a [sym53c8xx]
[ 3592.877800] [<ffffffff81093bfc>] ? handle_IRQ_event+0x58/0x126
[ 3592.878319] [<ffffffff810954e2>] ? handle_fasteoi_irq+0x7d/0xb5
[ 3592.878848] [<ffffffff81013957>] ? handle_irq+0x17/0x1d
[ 3592.879305] [<ffffffff81012fb1>] ? do_IRQ+0x57/0xb6
[ 3592.879744] [<ffffffff810114d3>] ? ret_from_intr+0x0/0x11
[ 3592.880237] [<ffffffff81053903>] ? __do_softirq+0x6e/0x19f
[ 3592.880723] [<ffffffff8106fa87>] ? tick_dev_program_event+0x2d/0x95
[ 3592.881284] [<ffffffff81011cac>] ? call_softirq+0x1c/0x30
[ 3592.881762] [<ffffffff81013903>] ? do_softirq+0x3f/0x7c
[ 3592.882230] [<ffffffff810537e1>] ? irq_exit+0x36/0x76
[ 3592.882691] [<ffffffff81025837>] ? smp_apic_timer_interrupt+0x87/0x95
[ 3592.883258] [<ffffffff81011673>] ? apic_timer_interrupt+0x13/0x20
[ 3592.883795] <EOI> [<ffffffff8118e009>] ? delay_tsc+0x0/0x73
[ 3592.884319] [<ffffffffa010f900>] ? sym_eh_handler+0x22e/0x2e2 [sym53c8xx]
[ 3592.884917] [<ffffffffa008e5de>] ? scsi_try_bus_reset+0x50/0xd9 [scsi_mod]
[ 3592.885522] [<ffffffffa008f565>] ? scsi_eh_ready_devs+0x50c/0x781 [scsi_mod]
[ 3592.886152] [<ffffffffa008fd6b>] ? scsi_error_handler+0x3c1/0x5b5 [scsi_mod]
[ 3592.886789] [<ffffffffa008f9aa>] ? scsi_error_handler+0x0/0x5b5 [scsi_mod]
[ 3592.887398] [<ffffffff81064789>] ? kthread+0x79/0x81
[ 3592.887836] [<ffffffff81011baa>] ? child_rip+0xa/0x20
[ 3592.888290] [<ffffffff81064710>] ? kthread+0x0/0x81
[ 3592.888721] [<ffffffff81011ba0>] ? child_rip+0x0/0x20
Unfortunatelly I have no idea how to reproduce the problem.
Log from /var/log/libvirt/qemu/
lsi_scsi: error: Unimplemented message 0x0c
What is more I had 7 vm running. Today four of them crashed at the same time.
The rest survived with something like this in syslog:
[651330.816043] sd 0:0:0:0: [sda] ABORT operation started
[651335.860027] sd 0:0:0:0: ABORT operation timed-out.
[651335.860600] sd 0:0:0:0: [sda] ABORT operation started
[651337.019355] sd 0:0:0:0: ABORT operation complete.
[651337.038506] sd 0:0:0:0: [sda] ABORT operation started
[651337.039100] sd 0:0:0:0: ABORT operation failed.
[651337.039624] sd 0:0:0:0: [sda] ABORT operation started
[651337.040303] sd 0:0:0:0: ABORT operation failed.
[651337.040834] sd 0:0:0:0: [sda] ABORT operation started
[651337.041417] sd 0:0:0:0: ABORT operation failed.
[651337.041949] sd 0:0:0:0: [sda] ABORT operation started
[651337.042534] sd 0:0:0:0: ABORT operation failed.
[651337.043072] sd 0:0:0:0: [sda] DEVICE RESET operation started
[651337.043834] scsi target0:0:0: control msgout: c.
[651337.520075] scsi target0:0:0: has been reset
[651337.521726] sd 0:0:0:0: DEVICE RESET operation complete.
[651337.522495] sd 0:0:0:0: M_REJECT received (0:0).
It looks like the problem is in host system and has influence on all machines
at the same time. I have found the same pattern in syslog on machines which
crashed. It was 3 days before crash. There is no information in host log files
at all. Is this possible that eucalyptus (1.6.2) caused this? With 1.6.1 I
didin't have these problems. Eucalyptus runs kvm (0.12 and 0.11) with commands:
/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin HOME=/root
USER=root LOGNAME=root /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 512 -smp
1,sockets=1,cores=1,threads=1 -name i-35B80630 -uuid
7e9b2fc1-9a9d-7114-3cb4-f4fdb3d51a3a -nographic -nodefaults -chardev
socket,id=monitor,path=/var/lib/libvirt/qemu/i-35B80630.monitor,server,nowait
-mon chardev=monitor,mode=readline -rtc base=utc -boot c -kernel
/var/lib/eucalyptus/instances/winnie/i-35B80630/kernel -initrd
/var/lib/eucalyptus/instances/winnie/i-35B80630/ramdisk -append root=/dev/sda1
console=ttyS0 -device lsi,id=scsi0,bus=pci.0,addr=0x5 -drive
file=/var/lib/eucalyptus/instances/winnie/i-35B80630/disk,if=none,id=drive-scsi0-0-0,boot=on
-device scsi-disk,bus=scsi0.0,scsi-id=0,drive=drive-scsi0-0-0,id=scsi0-0-0
-device e1000,vlan=0,id=net0,mac=d0:0d:35:b8:06:30,bus=pci.0,addr=0x4 -net
tap,fd=43,vlan=0,name=hostnet0 -chardev
file,id=serial0,path=/var/lib/eucalyptus/instances/winnie/i-35B80630/console.log
-device isa-serial,chardev=serial0 -usb -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3
/usr/bin/kvm -S -M pc-0.11 -enable-kvm -m 512 -smp 1 -name i-492407F3 -uuid
b2dc266e-a62a-4e13-3847-f9104eba4135 -nographic -monitor
unix:/var/lib/libvirt/qemu/i-492407F3.monitor,server,nowait -boot c -kernel
/var/lib/eucalyptus/instances/admin/i-492407F3/kernel -initrd
/var/lib/eucalyptus/instances/admin/i-492407F3/ramdisk -append root=/dev/sda1
console=ttyS0 -drive
file=/var/lib/eucalyptus/instances/admin/i-492407F3/disk,if=scsi,bus=0,unit=0,boot=on
-net nic,macaddr=d0:0d:49:24:07:f3,vlan=0,model=e1000,name=net0 -net
tap,fd=118,vlan=0,name=hostnet0 -serial
file:/var/lib/eucalyptus/instances/admin/i-492407F3/console.log -parallel none
-usb -vga none -balloon virtio
I can give the access to vm.