qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: an issue for device hot-unplug


From: Yu Zhang
Subject: Re: an issue for device hot-unplug
Date: Mon, 3 Apr 2023 18:59:33 +0200

Dear Laurent,

Thank you for your quick reply. We used qemu-7.1, but it is reproducible with qemu from v6.2 to the recent v8.0 release candidates.
I found that it's introduced by the commit  9323f892b39 (between v6.2.0-rc2 and v6.2.0-rc3). 

If it doesn't break anything else, it suffices to remove the line below from acpi_pcihp_device_unplug_request_cb():

    pdev->qdev.pending_deleted_event = true;

but you may have a reason to keep it. First of all, I'll open a bug in the bug tracker and let you know.

Best regards,
Yu Zhang

On Mon, Apr 3, 2023 at 6:32 PM Laurent Vivier <lvivier@redhat.com> wrote:
Hi Yu,

please open a bug in the bug tracker:

https://gitlab.com/qemu/qemu/-/issues

It's easier to track the problem.

What is the version of QEMU you are using?
Could you provide QEMU command line?

Thanks,
Laurent


On 4/3/23 15:24, Yu Zhang wrote:
> Dear Laurent,
>
> recently we run into an issue with the following error:
>
> command '{ "execute": "device_del", "arguments": { "id": "virtio-diskX" } }' for VM "id"
> failed ({ "return": {"class": "GenericError", "desc": "Device virtio-diskX is already in
> the process of unplug"} }).
>
> The issue is reproducible. With a few seconds delay before hot-unplug, hot-unplug just
> works fine.
>
> After a few digging, we found that the commit 9323f892b39 may incur the issue.
> ------------------
>      failover: fix unplug pending detection
>
>      Failover needs to detect the end of the PCI unplug to start migration
>      after the VFIO card has been unplugged.
>
>      To do that, a flag is set in pcie_cap_slot_unplug_request_cb() and reset in
>      pcie_unplug_device().
>
>      But since
>          17858a169508 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on Q35")
>      we have switched to ACPI unplug and these functions are not called anymore
>      and the flag not set. So failover migration is not able to detect if card
>      is really unplugged and acts as it's done as soon as it's started. So it
>      doesn't wait the end of the unplug to start the migration. We don't see any
>      problem when we test that because ACPI unplug is faster than PCIe native
>      hotplug and when the migration really starts the unplug operation is
>      already done.
>
>      See c000a9bd06ea ("pci: mark device having guest unplug request pending")
>          a99c4da9fc2a ("pci: mark devices partially unplugged")
>
>      Signed-off-by: Laurent Vivier <lvivier@redhat.com <mailto:lvivier@redhat.com>>
>      Reviewed-by: Ani Sinha <ani@anisinha.ca <mailto:ani@anisinha.ca>>
>      Message-Id: <20211118133225.324937-4-lvivier@redhat.com
> <mailto:20211118133225.324937-4-lvivier@redhat.com>>
>      Reviewed-by: Michael S. Tsirkin <mst@redhat.com <mailto:mst@redhat.com>>
>      Signed-off-by: Michael S. Tsirkin <mst@redhat.com <mailto:mst@redhat.com>>
> ------------------
> The purpose is for detecting the end of the PCI device hot-unplug. However, we feel the
> error confusing. How is it possible that a disk "is already in the process of unplug"
> during the first hot-unplug attempt? So far as I know, the issue was also encountered by
> libvirt, but they simply ignored it:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1878659
> <https://bugzilla.redhat.com/show_bug.cgi?id=1878659>
>
> Hence, a question is: should we have the line below in  acpi_pcihp_device_unplug_request_cb()?
>
>     pdev->qdev.pending_deleted_event = true;
>
> It would be great if you as the author could give us a few hints.
>
> Thank you very much for your reply!
>
> Sincerely,
>
> Yu Zhang @ Compute Platform IONOS
> 03.04.2013


reply via email to

[Prev in Thread] Current Thread [Next in Thread]