[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH RFC 0/2] Add debug interface to kick/call on purpose

From: Dongli Zhang
Subject: Re: [PATCH RFC 0/2] Add debug interface to kick/call on purpose
Date: Tue, 19 Jan 2021 14:11:28 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.6.1

On 1/18/21 8:59 AM, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrangé (berrange@redhat.com) wrote:
>> On Thu, Jan 14, 2021 at 04:27:28PM -0800, Dongli Zhang wrote:
>>> The virtio device/driver (e.g., vhost-scsi and indeed any device including
>>> e1000e) may hang due to the lost of IRQ or the lost of doorbell register
>>> kick, e.g.,
>>> https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!K_zaQzJhlvPjRZe9efEtyX8vB6fMlKQeNy_RGz7oPp9k76pC8zarG1nSs1SFSL2xI1g$
>>> The virtio-net was in trouble in above link because the 'kick' was not
>>> taking effect (missed).
>>> This RFC adds a new debug interface 'DeviceEvent' to DeviceClass to help
>>> narrow down if the issue is due to lost of irq/kick. So far the new
>>> interface handles only two events: 'call' and 'kick'. Any device (e.g.,
>>> e1000e or vhost-scsi) may implement (e.g., via eventfd, MSI-X or legacy
>>> IRQ).
>>> The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
>>> vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
>>> on purpose by admin at QEMU/host side for a specific device.
>> I'm really not convinced that we want to give admins the direct ability to
>> poke at internals of devices in a running QEMU. It feels like there is way
>> too much potential for the admin to make a situation far worse by doing
>> the wrong thing here,
> We already do have commands to write to an iport, and to inject MCEs for
> example; is this that much different?
>> and people dealing with support tickets will have
>> no idea that the admin has been poking internals of the device and broken
>> it by doing something wrong.
> You could add a one time log entry to say that this mischeivous command
> had been used.
>> You pointed to bug that hit where this could conceivably be useful, but
>> that's a one time issue and should not a common occurrance that justifies
>> making an official public API to poke at devices forever more IMHO.
> I think where it might be practically useful is if you were debugging a
> hung customers VM and need to find a way to get it to move again.
> THat's something I'm not familiar with on the virtio side;
> mst - is this useful from a virtio side?

BTW, the linux kernel blk-mq has similar idea/interface. To run the below will
be able to 'run' the block IO queue on purpose.

echo "kick" > /sys/kernel/debug/block/sda/state

It is helpful for diagnostic if we assume the IO stall is due to an unknown race
that a 'run' of queue is missing.

Dongli Zhang

> Dave
>> Regards,
>> Daniel
>> -- 

reply via email to

[Prev in Thread] Current Thread [Next in Thread]