[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size ex
From: |
Fernando Casas Schössow |
Subject: |
Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error |
Date: |
Mon, 11 Feb 2019 09:48:21 +0000 |
Thanks for looking into this Stefan.
I rebuilt Qemu with the new patch and got a couple of guests running with the
new build. Two of them using virtio-scsi and another one using virtio-blk. Now
I'm waiting for any of them to crash.
I also set libvirt to include the guest memory in the qemu dumps as I
understood you will want to look at both (qemu dump and guest memory dump).
I will reply to this thread once I have any news.
Kind regards.
Fernando
On lun, feb 11, 2019 at 4:17 AM, Stefan Hajnoczi <address@hidden> wrote:
On Wed, Feb 06, 2019 at 04:47:19PM +0000, Fernando Casas Schössow wrote:
I could also repro the same with virtio-scsi on the same guest a couple of
hours later: 2019-02-06 07:10:37.672+0000: starting up libvirt version: 4.10.0,
qemu version: 3.1.0, kernel: 4.19.18-0-vanilla, hostname: vmsvr01.homenet.local
LC_ALL=C
PATH=/bin:/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin
HOME=/root USER=root QEMU_AUDIO_DRV=spice /home/fernando/qemu-system-x86_64
-name guest=DOCKER01,debug-threads=on -S -object
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-32-DOCKER01/master-key.aes
-machine pc-i440fx-3.1,accel=kvm,usb=off,dump-guest-core=off -cpu
IvyBridge,ss=on,vmx=on,pcid=on,hypervisor=on,arat=on,tsc_adjust=on,umip=on,xsaveopt=on
-drive
file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on
-drive
file=/var/lib/libvirt/qemu/nvram/DOCKER01_VARS.fd,if=pflash,format=raw,unit=1
-m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid
4705b146-3b14-4c20-923c-42105d47e7fc -no-user-config -nodefaults -chardev
socket,id=charmonitor,fd=46,server,nowait -mon
chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global
kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global
PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device
ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device
ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4
-device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1
-device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2
-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x6 -device
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive
file=/storage/storage-ssd-vms/virtual_machines_ssd/docker01.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,aio=threads
-device
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,write-cache=on
-netdev tap,fd=48,id=hostnet0,vhost=on,vhostfd=50 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1c:af:ce,bus=pci.0,addr=0x3
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0
-chardev socket,id=charchannel0,fd=51,server,nowait -device
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
-chardev spicevmc,id=charchannel1,name=vdagent -device
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0
-spice port=5904,addr=127.0.0.1,disable-ticketing,seamless-migration=on
-device
qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2
-chardev spicevmc,id=charredir0,name=usbredir -device
usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev
spicevmc,id=charredir1,name=usbredir -device
usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -object
rng-random,id=objrng0,filename=/dev/random -device
virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -sandbox
on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg
timestamp=on 2019-02-06 07:10:37.672+0000: Domain id=32 is tainted:
high-privileges char device redirected to /dev/pts/5 (label charserial0) vdev
0x5585456ef6b0 ("virtio-scsi") vq 0x5585456f90a0 (idx 2) inuse 128 vring.num
128 2019-02-06T13:00:46.942424Z qemu-system-x86_64: Virtqueue size exceeded I'm
open to any tests or suggestions that can move the investigation forward and
find the cause of this issue.
Thanks for collecting the data! The fact that both virtio-blk and virtio-scsi
failed suggests it's not a virtqueue element leak in the virtio-blk or
virtio-scsi device emulation code. The hung task error messages from inside the
guest are a consequence of QEMU hitting the "Virtqueue size exceeded" error.
QEMU refuses to process further requests after the error, causing tasks inside
the guest to get stuck on I/O. I don't have a good theory regarding the root
cause. Two ideas: 1. The guest is corrupting the vring or submitting more
requests than will fit into the ring. Somewhat unlikely because it happens with
both Windows and Linux guests. 2. QEMU's virtqueue code is buggy, maybe the
memory region cache which is used for fast guest RAM accesses. Here is an
expanded version of the debug patch which might help identify which of these
scenarios is likely. Sorry, it requires running the guest again! This time
let's make QEMU dump core so both QEMU state and guest RAM are captured for
further debugging. That way it will be possible to extract more information
using gdb without rerunning. Stefan --- diff --git a/hw/virtio/virtio.c
b/hw/virtio/virtio.c index a1ff647a66..28d89fcbcb 100644 ---
a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -866,6 +866,7 @@ void
*virtqueue_pop(VirtQueue *vq, size_t sz) return NULL; } rcu_read_lock(); +
uint16_t old_shadow_avail_idx = vq->shadow_avail_idx; if
(virtio_queue_empty_rcu(vq)) { goto done; } @@ -879,6 +880,12 @@ void
*virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >=
vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); +
fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); +
fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); +
fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx %u\n",
old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx); +
fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed) %u\n",
vq->vring.avail, virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail,
idx))); + fprintf(stderr, "used_idx %u\n", vq->used_idx); + abort(); /* <---
core dump! */ virtio_error(vdev, "Virtqueue size exceeded"); goto done; }
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Stefan Hajnoczi, 2019/02/01
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Fernando Casas Schössow, 2019/02/01
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Stefan Hajnoczi, 2019/02/04
- Message not available
- Message not available
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Fernando Casas Schössow, 2019/02/06
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Stefan Hajnoczi, 2019/02/10
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error,
Fernando Casas Schössow <=
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Fernando Casas Schössow, 2019/02/18
- Message not available
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Fernando Casas Schössow, 2019/02/19
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Stefan Hajnoczi, 2019/02/20
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Paolo Bonzini, 2019/02/20
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Fernando Casas Schössow, 2019/02/20
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Stefan Hajnoczi, 2019/02/21
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Fernando Casas Schössow, 2019/02/21
- Message not available
- Message not available
- Message not available
- Message not available
- Message not available
- Message not available
- Message not available
- Message not available
- Message not available
- Message not available
- Message not available
- Message not available
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Stefan Hajnoczi, 2019/02/22
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Paolo Bonzini, 2019/02/22
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Fernando Casas Schössow, 2019/02/22