qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC] adding a generic QAPI event for failed device hotunplug


From: Igor Mammedov
Subject: Re: [RFC] adding a generic QAPI event for failed device hotunplug
Date: Tue, 23 Mar 2021 14:06:36 +0100

On Tue, 23 Mar 2021 14:33:28 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Mon, Mar 22, 2021 at 01:06:53PM +0100, Paolo Bonzini wrote:
> > On 22/03/21 07:39, David Gibson wrote:  
> > > > QEMU doesn't really keep track of "in flight" unplug requests, and as
> > > > long as that's the case, its timeout even will have the same issue.  
> > > Not generically, maybe.  In the PAPR code we effectively do, by means
> > > of the 'unplug_requested' boolean in the DRC structure.  Maybe that's
> > > a mistake, but at the time I couldn't see how else to handle things.  
> > 
> > No, that's good.  x86 also tracks it in some registers that are accessible
> > from the ACPI firmware.  See "PCI slot removal notification" in
> > docs/specs/acpi_pci_hotplug.txt.
> >   
> > > Currently we will resolve all "in flight" requests at machine reset
> > > time, effectively completing those requests.  Does that differ from
> > > x86 behaviour?  
> > 
> > IIRC on x86 the requests are instead cancelled, but I'm not 100%
> > sure.  
> 
> Ah... we'd better check that and try to make ppc consistent with
> whatever it does.
> 

Sorry for being late to discussion, I can't say much for all possible ways to 
unplug
PCI device (aside that it's a complicated mess), but hopefully I can shed some 
light on
state/behavior of ACPI based methods.

* x86 - ACPI based PCI hotplug
 Its sole existence was dictated by Widows not supporting SHPC (conventional 
PCI),
 and it looks like 'thanks' to Windows buggy drivers we would have to use it for
 PCI-E  as well (Julia works on it).
 HW registers described in docs/specs/acpi_pci_hotplug.txt are our own 
invention,
 they help to raise standard ACPI 'device check' and 'eject request' events when
 guest executes AML bytecode. Potentially there is possibility for guest to 
report
 plug/unplug progress via ACPI _OST method (including failure/completion) but 
given
 my experience with how Windows PCI core worked so far that may be not used by 
it
 (honestly I haven't tried to explore possibility, due to lack of interest in 
it).
 
 regarding unplug - on device_del QEMU raises SCI interrupt, after this the 
process is
 asynchronous. When ACPI interpreter gets SCI it sends a respective _EJ0 event 
to
 devices mentioned in PCI_DOWN_BASE register. After getting the event, guest OS 
may
 decide to eject PCI device (which causes clearing of device's bit in 
PCI_DOWN_BASE)
 or refuse to do it. There is no any progress tracking in QEMU for failure and 
device's
 bit in PCI_DOWN_BASE is kept set. On the next device_(add|del) (for any PCI 
device)
 guest will see it again and will retry removal.
 Also if guest reboots with any bits in PCI_DOWN_BASE set, respective devices 
will
 be deleted on QEMU side.
 There is no other way to cancel removal request in PCI_DOWN_BASE, aside of 
explicitly
 ejecting device on guest request or implicitly on reboot.
 IMHO:
     Sticky nature of PCI_(UP|DOWN)_BASE is more trouble than help but its 
there since
     SeaBios times so it's ABI we are stuck with. If I were re-implementing it 
now,
     I would use one shot event that's cleared once guest read it and if 
possible
     implement _OST status reporting (if it works on Windows).
 As it stands now, once device_del is issued one user can't know when PCI 
device will be
 removed. No timeout will help with it.
 
* ACPI CPU/Memory hotplug
 Events triggered by device_del are one shot, then guest may report progress to 
QEMU using
 _OST method (qapi_event_send_acpi_device_ost) (I know that libvirt were aware 
of it,
 but I don't recall what it does with it). So QEMU might send '_UNPLUG_ERROR' 
event to
 user if guest decides so. But instead of duplicating all possible events from 
spec
 QEMU will pass _OST arguments [1] as is for user to interpret as described by 
standard.
 Though I'd say _OST is not 100% reliable, depending used Windows or linux 
kernel version
 they might skip on reporting some events. But I don't recall exact state at 
the time I've
 been testing it. So I'd call status reporting support as 'best effort'.
 Also it doesn't feature pending removal on reboot, that our ACPI PCI hotplug 
code has.
 So with well behaving guest user will get notified about failure or device 
removal (when
 guest is able to run its code), for broken guests I'm more inclined to say 
'use fixed guest'
 to get sane behavior.
 Policy for user is to retry on failure (there is no bad side effects on retry).

I think that any kind of timeout here is inherently racy, in async hot[un]plug 
usecase,
all user has to do is just sufficiently over-commit host (or run it nested).
So it's just a question of how long it will take for user to come back with a 
bug report. 

* As far as I'm aware mentioned 'pending_deleted_event' is there to make 
transparent
  failover magic happen (CCing Jens, also Michael might know how it works)

* SHCP & PCI-E has its own set of unplug quirks, which I know little about but 
Julia worked
  with Michael on fixing PCI-E bugs (mostly related how Windows drivers handle 
unplug,
  some are not possible to fix, hence decision to add ACPI based hotplug to Q35 
as workaround).
  So they might know specifics.

1) ACPI spec: _OST (OSPM Status Indication)




reply via email to

[Prev in Thread] Current Thread [Next in Thread]