qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [patch v5 11/12] vfio: device may stuck in D3 when doin


From: Alex Williamson
Subject: Re: [Qemu-devel] [patch v5 11/12] vfio: device may stuck in D3 when doing aer recovery
Date: Thu, 24 Mar 2016 20:22:55 -0600

On Fri, 25 Mar 2016 09:38:09 +0800
Chen Fan <address@hidden> wrote:

> On 03/25/2016 06:54 AM, Alex Williamson wrote:
> > On Wed, 23 Mar 2016 18:12:06 +0800
> > Cao jin <address@hidden> wrote:
> >  
> >> From: Chen Fan <address@hidden>
> >>
> >> when a physical device aer occurred, the device state probably
> >> is not in D0 in a short time, if we recover the device quickly.
> >> we may stuck in D3 state when force to change device state to D0.
> >> we may need to wait for a short time to inject the error to guest.
> >>
> >> Signed-off-by: Chen Fan <address@hidden>
> >> ---
> >>   hw/vfio/pci.c | 3 +++
> >>   1 file changed, 3 insertions(+)
> >>
> >> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> >> index 25fc095..5216e7f 100644
> >> --- a/hw/vfio/pci.c
> >> +++ b/hw/vfio/pci.c
> >> @@ -2658,6 +2658,9 @@ static void vfio_err_notifier_handler(void *opaque)
> >>           msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN :
> >>                                    PCI_ERR_ROOT_CMD_NONFATAL_EN;
> >>   
> >> +        /* wait a bit to ensure aer device is ready */
> >> +        usleep(2 * 1000);  
> > Where does this number come from?  Why would the device be in D3?  I
> > don't understand this at all.  
> Hi Alex,
> 
>      when I tested the code in my environment, I found that when I used
> the aer-inject module to inject a fake aer error to device on host, the qemu
> would throw out the message "vfio: Unable to power on device, stuck in D3"
> on and off. if I use "gdb" to debug the vfio_pci_pre_reset, the phenomenon
> would not appearance, I just thought it should be some timing race issue,
> so I use a sleep() to wait 2ms (double the reset time of 1ms) to ensure the
> device state is ready. maybe the root reason still need to be 
> investigated deeply.

Yes, it sounds like you need to investigate this further, the delay is
arbitrary and perhaps suggests a race that needs to be fixed
correctly.  Thanks,

Alex



reply via email to

[Prev in Thread] Current Thread [Next in Thread]