qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [patch v6 11/12] vfio: register aer resume notification


From: Zhou Jie
Subject: Re: [Qemu-devel] [patch v6 11/12] vfio: register aer resume notification handler for aer resume
Date: Wed, 11 May 2016 11:11:39 +0800
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.0

Hi, Alex
    What do you think about the following solution?
    1. Detect support for resume notification.
       If host vfio driver does not have resume notifier flags,
       Directly fail to boot up VM as with aer enabled.
    2. Immediately notify the VM on error detected.
    3. Stall any access to the device until resume is signaled.
       Disable mmaps, drop writes, return -1 for reads.
    4. Delay the guest directed bus reset.
       Don't reset bus in vfio_pci_reset function.
    5. Wait for resume notification.
       If we don't get the resume notification from the host after
       some timeout, we would abort the guest directed bus reset
       altogether and make the device disappear,
       Initiating an unplug of the device to prevent it from further
       interacting with the VM.
    6. After get the resume notification.
       Reset bus.
       It the second bus reset. Because the host did bus reset already.
       But as you said we shouldn't necessarily design the API that
       strictly around the current behavior of the Linux AER handler.

Sincerely,
Zhou Jie

On 2016/5/7 0:39, Alex Williamson wrote:
On Fri, 6 May 2016 09:38:41 +0800
Chen Fan <address@hidden> wrote:

On 04/26/2016 10:48 PM, Alex Williamson wrote:
On Tue, 26 Apr 2016 11:39:02 +0800
Chen Fan<address@hidden>  wrote:

On 04/14/2016 09:02 AM, Chen Fan wrote:
On 04/12/2016 05:38 AM, Alex Williamson wrote:
On Tue, 5 Apr 2016 19:42:02 +0800
Cao jin<address@hidden>  wrote:

From: Chen Fan<address@hidden>

for supporting aer recovery, host and guest would run the same aer
recovery code, that would do the secondary bus reset if the error
is fatal, the aer recovery process:
    1. error_detected
    2. reset_link (if fatal)
    3. slot_reset/mmio_enabled
    4. resume

it indicates that host will do secondary bus reset to reset
the physical devices under bus in step 2, that would cause
devices in D3 status in a short time. but in qemu, we register
an error detected handler, that would be invoked as host broadcasts
the error-detected event in step 1, in order to avoid guest do
reset_link when host do reset_link simultaneously. it may cause
fatal error. we introduce a resmue notifier to assure host reset
completely. then do guest aer injection.
Why is it safe to continue running the VM between the error detected
notification and the resume notification?  We're just pushing back the
point at which we inject the AER into the guest, potentially negating
any benefit by allowing the VM to consume bad data.  Shouldn't we
instead be immediately notifying the VM on error detected, but stalling
any access to the device until resume is signaled?  How do we know that
resume will ever be signaled?  We have both the problem that we may be
running on an older kernel that won't support a resume notification and
the problem that seeing a resume notification depends on the host being
able to successfully complete a link reset after fatal error. We can
detect support for resume notification, but we still need a strategy
for never receiving it.  Thanks,
That's make sense, but I haven't came up with a good idea. do you have
any idea, Alex?
I don't know that there are any good solutions here.  We need to
respond to the current error notifier interrupt and not regress from
our support there.  I think that means that if we want to switch from a
simple halt-on-error to a mechanism for the guest to handle recovery,
we need to disable access to the device between being notified that the
error occurred and being notified to resume.  We can do that by
disabling mmaps to the device and preventing access via the slow path
handlers.  I don't know what the best solution is for preventing access,
do we block and pause the VM or do we drop writes and return -1 for
reads, that's something that needs to be determined.  We also need to
inject the AER into the VM at the point we're notified of an error
because the VM needs to know as soon as possible to stop using the
device or trusting any data from it.  The next coordination point would
be something like the resume notifier that you've added and there are
numerous questions around the interaction of that with the guest
handling.  Clearly we can't do a guest directed bus reset until we get
the resume notifier, so do we block that execution path in QEMU until
the resume notification is received?  What happens if we don't get that
notification?  Is there any way that we can rely on the host having
done a bus reset to the point where we don't need to act on the guest
directed reset?  These are all things that need to be figured out.
Thanks,
Maybe we can simply pause the vcpu running and avoid the VM to
access the device. and add two flags in VFIO_DEVICE_GET_INFO to query
whether the vfio pci driver has a resume notifier,
if it does not have resume notifier flags, we can directly fail to boot
up VM
as with aer enabled.

We can already tell if a resume interrupt is supported between the IRQ
count in vfio_device_info and a probe with vfio_irq_info, what would
additional flags in vfio_device_info tell us beyond a resume interrupt
being supported?  Is pausing the VM acceptable from a service guarantee
perspective to users?  A bus reset can take a full second and I imagine
deeper PCI hierarchies can push that out depending on what level the
error occurs.  A second of downtime may be enough to trigger failovers
to other systems.  If we were to disable mmaps when a fault occurs, we
could trap any further device access, drop writes, return -1 for
reads.  This seems reasonable since we've already notified the VM that
the device had a fault.  The synchronization point seems like when the
guest tries to do a bus reset, we need to block that until we get the
resume notification from the host.  Perhaps if that doesn't occur after
some timeout, we would abort the guest directed bus reset altogether
and make the device disappear, perhaps even initiating an unplug of the
device to prevent it from further interacting with the VM.

otherwise, we should wait for resume notifier coming to
restart the cpu. about the problem of the reduplicated bus reset by host
and guest,
I think qemu can according to the error is fatal or non-fatal to decide
whether need
to do a bus reset on guest, I think it's not critical and could be
resolved later.

The vfio error interrupt doesn't signal non-fatal errors afaik.  I'm
also not sure we have an guarantee that the host has performed a bus
reset, we shouldn't necessarily design the API that strictly around the
current behavior of the Linux AER handler.  So I don't know that
there's any practical way to avoid duplicate bus resets between host
and guest recovery.  Thanks,

Alex


.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]