[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] sda abort with virtio-scsi
From: |
Hannes Reinecke |
Subject: |
Re: [Qemu-devel] sda abort with virtio-scsi |
Date: |
Thu, 4 Feb 2016 07:59:58 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0 |
On 02/04/2016 12:19 AM, Paolo Bonzini wrote:
>
>
> On 03/02/2016 22:46, Jim Minter wrote:
>> I am hitting the following VM lockup issue running a VM with latest
>> RHEL7 kernel on a host also running latest RHEL7 kernel. FWIW I'm using
>> virtio-scsi because I want to use discard=unmap. I ran the VM as follows:
>>
>> /usr/libexec/qemu-kvm -nodefaults \
>> -cpu host \
>> -smp 4 \
>> -m 8192 \
>> -drive discard=unmap,file=vm.qcow2,id=disk1,if=none,cache=unsafe \
>> -device virtio-scsi-pci \
>> -device scsi-disk,drive=disk1 \
>> -netdev bridge,id=net0,br=br0 \
>> -device virtio-net-pci,netdev=net0,mac=$(utils/random-mac.py) \
>> -chardev socket,id=chan0,path=/tmp/rhev.sock,server,nowait \
>> -chardev socket,id=chan1,path=/tmp/qemu.sock,server,nowait \
>> -monitor unix:tmp/vm.sock,server,nowait \
>> -device virtio-serial-pci \
>> -device virtserialport,chardev=chan0,name=com.redhat.rhevm.vdsm \
>> -device virtserialport,chardev=chan1,name=org.qemu.guest_agent.0 \
>> -device cirrus-vga \
>> -vnc none \
>> -usbdevice tablet
>>
>> The host was busyish at the time, but not excessively (IMO). Nothing
>> untoward in the host's kernel log; host storage subsystem is fine. I
>> didn't get any qemu logs this time around, but I will when the issue
>> next recurs. The VM's full kernel log is attached; here are the
>> highlights:
>
> Hannes, were you going to send a patch to disable time outs?
>
Rah. Didn't I do it already?
Seems like I didn't; will be doing so shortly.
>>
>> INFO: rcu_sched detected stalls on CPUs/tasks: { 3} (detected by 2, t=60002
>> jiffies, g=5253, c=5252, q=0)
>> sending NMI to all CPUs:
>> NMI backtrace for cpu 1
>> CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.10.0-327.4.5.el7.x86_64 #1
>> Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
>> task: ffff88023417d080 ti: ffff8802341a4000 task.ti: ffff8802341a4000
>> RIP: 0010:[<ffffffff81058e96>] [<ffffffff81058e96>]
>> native_safe_halt+0x6/0x10
>> RSP: 0018:ffff8802341a7e98 EFLAGS: 00000286
>> RAX: 00000000ffffffed RBX: ffff8802341a4000 RCX: 0100000000000000
>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000046
>> RBP: ffff8802341a7e98 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
>> R13: ffff8802341a4000 R14: ffff8802341a4000 R15: 0000000000000000
>> FS: 0000000000000000(0000) GS:ffff88023fc80000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007f4978587008 CR3: 000000003645e000 CR4: 00000000003407e0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Stack:
>> ffff8802341a7eb8 ffffffff8101dbcf ffff8802341a4000 ffffffff81a68260
>> ffff8802341a7ec8 ffffffff8101e4d6 ffff8802341a7f20 ffffffff810d62e5
>> ffff8802341a7fd8 ffff8802341a4000 2581685d70de192c 7ba58fdb3a3bc8d4
>> Call Trace:
>> [<ffffffff8101dbcf>] default_idle+0x1f/0xc0
>> [<ffffffff8101e4d6>] arch_cpu_idle+0x26/0x30
>> [<ffffffff810d62e5>] cpu_startup_entry+0x245/0x290
>> [<ffffffff810475fa>] start_secondary+0x1ba/0x230
>> Code: 00 00 00 00 00 55 48 89 e5 fa 5d c3 66 0f 1f 84 00 00 00 00 00 55 48
>> 89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d> c3 0f 1f 84
>> 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84
>> NMI backtrace for cpu 0
>
> This is the NMI watchdog firing; the CPU got stuck for 20 seconds. The
> issue was not a busy host, but a busy storage (could it be a network
> partition if the disk was hosted on NFS???)
>
> Firing the NMI watchdog is fixed in more recent QEMU, which has
> asynchronous cancellation, assuming you're running RHEL's QEMU 1.5.3
> (try /usr/libexec/qemu-kvm --version, or rpm -qf /usr/libexec/qemu-kvm).
>
Actually, you still cannot do _real_ async cancellation of I/O; the
linux aio subsystem implements io_cancel(), but the cancellation
just aborts the (internal) waitqueue element, not the I/O itself.
Cheers,
Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
address@hidden +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
- [Qemu-devel] sda abort with virtio-scsi, Jim Minter, 2016/02/03
- Re: [Qemu-devel] sda abort with virtio-scsi, Paolo Bonzini, 2016/02/03
- Re: [Qemu-devel] sda abort with virtio-scsi, Jim Minter, 2016/02/03
- Re: [Qemu-devel] sda abort with virtio-scsi, Paolo Bonzini, 2016/02/04
- Re: [Qemu-devel] sda abort with virtio-scsi, Denis V. Lunev, 2016/02/04
- Re: [Qemu-devel] sda abort with virtio-scsi, Jim Minter, 2016/02/04
- Re: [Qemu-devel] sda abort with virtio-scsi, Hannes Reinecke, 2016/02/04
- Re: [Qemu-devel] sda abort with virtio-scsi, Paolo Bonzini, 2016/02/04
- Re: [Qemu-devel] sda abort with virtio-scsi, Hannes Reinecke, 2016/02/04
- Re: [Qemu-devel] sda abort with virtio-scsi, Jim Minter, 2016/02/08
Re: [Qemu-devel] sda abort with virtio-scsi,
Hannes Reinecke <=