On Thu, Oct 22, 2020 at 5:33 PM Michael S. Tsirkin <
mst@redhat.com> wrote:
On Thu, Oct 22, 2020 at 05:10:43PM +0300, Marcel Apfelbaum wrote:
>
>
> On Thu, Oct 22, 2020 at 5:01 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Oct 22, 2020 at 04:55:10PM +0300, Marcel Apfelbaum wrote:
> > Hi David, Michael,
> >
> > On Thu, Oct 22, 2020 at 3:56 PM David Gibson <dgibson@redhat.com> wrote:
> >
> > On Thu, 22 Oct 2020 08:06:55 -0400
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >
> > > On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum wrote:
> > > > From: Marcel Apfelbaum <marcel@redhat.com>
> > > >
> > > > During PCIe Root Port's transition from Power-Off to Power-ON (or
> > vice-versa)
> > > > the "Slot Control Register" has the "Power Indicator Control"
> > > > set to "Blinking" expressing a "power transition" mode.
> > > >
> > > > Any hotplug operation during the "power transition" mode is not
> > permitted
> > > > or at least not expected by the Guest OS leading to strange
> failures.
> > > >
> > > > Detect and refuse hotplug operations in such case.
> > > >
> > > > Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> > > > ---
> > > > hw/pci/pcie.c | 7 +++++++
> > > > 1 file changed, 7 insertions(+)
> > > >
> > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > > > index 5b48bae0f6..2fe5c1473f 100644
> > > > --- a/hw/pci/pcie.c
> > > > +++ b/hw/pci/pcie.c
> > > > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler
> > *hotplug_dev, DeviceState *dev,
> > > > PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev);
> > > > uint8_t *exp_cap = hotplug_pdev->config + hotplug_pdev->
> > exp.exp_cap;
> > > > uint32_t sltcap = pci_get_word(exp_cap + PCI_EXP_SLTCAP);
> > > > + uint32_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL);
> > > >
> > > > /* Check if hot-plug is disabled on the slot */
> > > > if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) == 0) {
> > > > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb
> (HotplugHandler
> > *hotplug_dev, DeviceState *dev,
> > > > return;
> > > > }
> > > >
> > > > + if ((sltctl & PCI_EXP_SLTCTL_PIC) ==
> PCI_EXP_SLTCTL_PWR_IND_BLINK)
> > {
> > > > + error_setg(errp, "Hot-plug failed: %s is in Power
> Transition",
> > > > + DEVICE(hotplug_pdev)->id);
> > > > + return;
> > > > + }
> > > > +
> > > > pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), dev,
> errp);
> > > > }
> > >
> > > Probably the only way to handle for existing machine types.
> >
> >
> > I agree
> >
> >
> > > For new ones, can't we queue it in host memory somewhere?
> >
> >
> >
> > I am not sure I understand what will be the flow.
> > - The user asks for a hotplug operation.
> > - QEMU deferred operation.
> > After that the operation may still fail, how would the user know if the
> > operation
> > succeeded or not?
>
>
> How can it fail? It's just a button press ...
>
>
>
> Currently we have "Hotplug unsupported."
> With this change we have "Guest/System not ready"
Hotplug unsupported is not an error that can trigger with
a well behaved management such as libvirt.
>
>
> >
> >
> > I'm not actually convinced we can't do that even for existing machine
> > types.
> >
> >
> > Is a Guest visible change, I don't think we can do it.
> >
> >
> > So I'm a bit hesitant to suggest going ahead with this without
> > looking a bit closer at whether we can implement a wait-for-ready in
> > qemu, rather than forcing every user of qemu (human or machine) to do
> > so.
> >
> >
> > While I agree it is a pain from the usability point of view, hotplug
> operations
> > are allowed to fail. This is not more than a corner case, ensuring the
> right
> > response (gracefully erroring out) may be enough.
> >
> > Thanks,
> > Marcel
> >
>
>
> I don't think they ever failed in the past so management is unlikely
> to handle the failure by retrying ...
>
>
> That would require some management handling, yes.
> But even without a "retry", failing is better than strange OS behavior.
>
> Trying a better alternative like deferring the operation for new machines
> would make sense, however is out of the scope of this patch
Expand the scope please. The scope should be "solve a problem xx" not
"solve a problem xx by doing abc".
The scope is detecting a hotplug error early instead
passing to the Guest OS a hotplug operation that we know it will fail.
> that simply
> detects the error leaving us in a slightly better state than today.
>
> Thanks,
> Marcel
Not applying a patch is the only tool we maintainers have to influence
people to solve the problem fully.
That's why I'm not inclined to apply
"slightly better" patches generally.
The patch is a proposal following some offline discussions on this matter.
I personally see the value of it versus what we have today.
Thanks,
Marcel
>
>
> >
> >
> >
> > --
> > David Gibson <dgibson@redhat.com>
> > Principal Software Engineer, Virtualization, Red Hat
> >
>
>