qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [Qemu-devel] Guest unresponsive after Virtqueue size ex


From: Fernando Casas Schössow
Subject: Re: [Qemu-block] [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error
Date: Thu, 31 Jan 2019 11:32:32 +0000

Hi,

Sorry for resurrecting this thread after so long but I just upgraded the host to Qemu 3.1 and libvirt 4.10 and I'm still facing this problem.
At the moment I cannot use virtio disks (virtio-blk nor virtio-scsi) with my guests in order to avoid this issue so as a workaround I'm using SATA emulated storage which is not ideal but is perfectly stable.

Do you have any suggestions on how can I progress troubleshooting?
Qemu is not crashing so I don't have any dumps that can be analyzed. The guest is just "stuck" and all I can do is destroy it and start it again.
It's really frustrating that after all this time I couldn't find the cause for this issue so any ideas are welcome.

Thanks.

Fernando


From: Fernando Casas Schössow <address@hidden>
Sent: Saturday, June 24, 2017 10:34 AM
To: Ladi Prosek
Cc: address@hidden
Subject: Re: [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error
 
Hi Ladi,

After running for about 15hrs two different guests (one Windows, one Linux) crashed with around 1 hour difference and the same error in qemu log "Virqueue size exceeded".

The Linux guest was already running on virtio_scsi and without virtio_balloon. :(
I compiled and attached gdbserver to the qemu process for this guest but when I did this I got the following warning in gdbserver:

warning: Cannot call inferior functions, Linux kernel PaX protection forbids return to non-executable pages!

The default Alpine kernel is a grsec kernel. Not sure if this will interfere with debugging or not but I suspect yes.
If you need me to replace the grsec kernel with a vanilla one (also available as an option in Alpine) let me know and I will do so.
Otherwise send me an email directly so I can share with you the host:port details so you can connect to gdbserver.

Thanks,

Fer 

On vie, jun 23, 2017 at 8:29 , Fernando Casas Schössow <address@hidden> wrote:
Hi Ladi,

Small update. Memtest86+ was running on the host for more than 54 hours. 8 passes were completed and no memory errors found. For now I think we can assume that the host memory is ok.

I just started all the guests one hour ago. I will monitor them and once one fails I will attach the debugger and let you know.

Thanks.

Fer

On jue, jun 22, 2017 at 9:43 , Ladi Prosek <address@hidden> wrote:
Hi Fernando, On Wed, Jun 21, 2017 at 2:19 PM, Fernando Casas Schössow <address@hidden> wrote:
Hi Ladi, Sorry for the delay in my reply. I will leave the host kernel alone for now then. For the last 15 hours or so I'm running memtest86+ on the host. So far so good. Two passes no errors so far. I will try to leave it running for at least another 24hr and report back the results. Hopefully we can discard the memory issue at hardware level. Regarding KSM, that will be the next thing I will disable if after removing the balloon device guests still crash. About leaving a guest in a failed state for you to debug it remotely, that's absolutely an option. We just need to coordinate so I can give you remote access to the host and so on. Let me know if any preparation is needed in advance and which tools you need installed on the host.
I think that gdbserver attached to the QEMU process should be enough. When the VM gets into the broken state please do something like: gdbserver --attach host:12345 <QEMU pid> and let me know the host name and port (12345 in the above example).
Once I again I would like to thank you for all your help and your great disposition!
You're absolutely welcome, I don't think I've done anything helpful so far :)
Cheers, Fer On mar, jun 20, 2017 at 9:52 , Ladi Prosek <address@hidden> wrote: The host kernel is less likely to be responsible for this, in my opinion. I'd hold off on that for now. And last but not least KSM is enabled on the host. Should I disable it? Could be worth the try. Following your advice I will run memtest on the host and report back. Just as a side comment, the host is running on ECC memory. I see. Would it be possible for you, once a guest is in the broken state, to make it available for debugging? By attaching gdb to the QEMU process for example and letting me poke around it remotely? Thanks!





reply via email to

[Prev in Thread] Current Thread [Next in Thread]