On 20/02/19 17:58, Stefan Hajnoczi wrote:
On Mon, Feb 18, 2019 at 07:21:25AM +0000, Fernando Casas Schössow wrote:
It took a few days but last night the problem was reproduced. This is the information from the log: vdev 0x55f261d940f0 ("virtio-blk") vq 0x55f261d9ee40 (idx 0) inuse 128 vring.num 128 old_shadow_avail_idx 58874 last_avail_idx 58625 avail_idx 58874
avail 0x3d87a800 avail_idx (cache bypassed) 58625
Hi Paolo, Are you aware of any recent MemoryRegionCache issues? The avail_idx value 58874 was read via the cache while a non-cached read produces 58625! I suspect that 58625 is correct since the vring is already full and the driver wouldn't bump avail_idx any
further until requests complete. Fernando also hits this issue with virtio-scsi so it's not a virtio_blk.ko driver bug or a virtio-blk device emulation issue.
No, I am not aware of any issues. How can I get the core dump (and the corresponding executable to get the symbols)? Alternatively, it should be enough to print the vq->vring.caches->avail.mrs from the debugger. Also, one possibility is to add in vring_avail_idx
an assertion like assert(vq->shadow_availa_idx == virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail, idx))); and try to catch the error earlier. Paolo
A QEMU core dump is available for debugging. Here is the patch that produced this debug output: diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index a1ff647a66..28d89fcbcb 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -866,6
+866,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) return NULL; } rcu_read_lock(); + uint16_t old_shadow_avail_idx = vq->shadow_avail_idx; if (virtio_queue_empty_rcu(vq)) { goto done; } @@ -879,6 +880,12 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
max = vq->vring.num; if (vq->inuse >= vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num);
+ fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx %u\n", old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx); + fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed) %u\n", vq->vring.avail, virtio_lduw_phys(vdev,
vq->vring.avail + offsetof(VRingAvail, idx))); + fprintf(stderr, "used_idx %u\n", vq->used_idx); + abort(); /* <--- core dump! */ virtio_error(vdev, "Virtqueue size exceeded"); goto done; } Stefan
used_idx 58497 2019-02-18 03:20:08.605+0000: shutting down, reason=crashed The dump file, including guest memory, was generated successfully (after gzip the file is around 492MB). I switched the guest now to virtio-scsi to get the information and
dump with this setup as well. How should we proceed? Thanks.