[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH RFC 0/2] Add debug interface to kick/call on purpose

From: Dr. David Alan Gilbert
Subject: Re: [PATCH RFC 0/2] Add debug interface to kick/call on purpose
Date: Mon, 18 Jan 2021 16:59:34 +0000
User-agent: Mutt/1.14.6 (2020-07-11)

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Thu, Jan 14, 2021 at 04:27:28PM -0800, Dongli Zhang wrote:
> > The virtio device/driver (e.g., vhost-scsi and indeed any device including
> > e1000e) may hang due to the lost of IRQ or the lost of doorbell register
> > kick, e.g.,
> > 
> > https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html
> > 
> > The virtio-net was in trouble in above link because the 'kick' was not
> > taking effect (missed).
> > 
> > This RFC adds a new debug interface 'DeviceEvent' to DeviceClass to help
> > narrow down if the issue is due to lost of irq/kick. So far the new
> > interface handles only two events: 'call' and 'kick'. Any device (e.g.,
> > e1000e or vhost-scsi) may implement (e.g., via eventfd, MSI-X or legacy
> > IRQ).
> > 
> > The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
> > vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
> > on purpose by admin at QEMU/host side for a specific device.
> I'm really not convinced that we want to give admins the direct ability to
> poke at internals of devices in a running QEMU. It feels like there is way
> too much potential for the admin to make a situation far worse by doing
> the wrong thing here,

We already do have commands to write to an iport, and to inject MCEs for
example; is this that much different?

> and people dealing with support tickets will have
> no idea that the admin has been poking internals of the device and broken
> it by doing something wrong.

You could add a one time log entry to say that this mischeivous command
had been used.

> You pointed to bug that hit where this could conceivably be useful, but
> that's a one time issue and should not a common occurrance that justifies
> making an official public API to poke at devices forever more IMHO.

I think where it might be practically useful is if you were debugging a
hung customers VM and need to find a way to get it to move again.
THat's something I'm not familiar with on the virtio side;
mst - is this useful from a virtio side?


> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

reply via email to

[Prev in Thread] Current Thread [Next in Thread]