qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: an issue for device hot-unplug


From: Laurent Vivier
Subject: Re: an issue for device hot-unplug
Date: Mon, 3 Apr 2023 18:32:03 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0

Hi Yu,

please open a bug in the bug tracker:

https://gitlab.com/qemu/qemu/-/issues

It's easier to track the problem.

What is the version of QEMU you are using?
Could you provide QEMU command line?

Thanks,
Laurent


On 4/3/23 15:24, Yu Zhang wrote:
Dear Laurent,

recently we run into an issue with the following error:

command '{ "execute": "device_del", "arguments": { "id": "virtio-diskX" } }' for VM "id" failed ({ "return": {"class": "GenericError", "desc": "Device virtio-diskX is already in the process of unplug"} }).

The issue is reproducible. With a few seconds delay before hot-unplug, hot-unplug just works fine.

After a few digging, we found that the commit 9323f892b39 may incur the issue.
------------------
     failover: fix unplug pending detection

     Failover needs to detect the end of the PCI unplug to start migration
     after the VFIO card has been unplugged.

     To do that, a flag is set in pcie_cap_slot_unplug_request_cb() and reset in
     pcie_unplug_device().

     But since
         17858a169508 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on Q35")
     we have switched to ACPI unplug and these functions are not called anymore
     and the flag not set. So failover migration is not able to detect if card
     is really unplugged and acts as it's done as soon as it's started. So it
     doesn't wait the end of the unplug to start the migration. We don't see any
     problem when we test that because ACPI unplug is faster than PCIe native
     hotplug and when the migration really starts the unplug operation is
     already done.

     See c000a9bd06ea ("pci: mark device having guest unplug request pending")
         a99c4da9fc2a ("pci: mark devices partially unplugged")

     Signed-off-by: Laurent Vivier <lvivier@redhat.com 
<mailto:lvivier@redhat.com>>
     Reviewed-by: Ani Sinha <ani@anisinha.ca <mailto:ani@anisinha.ca>>
    Message-Id: <20211118133225.324937-4-lvivier@redhat.com <mailto:20211118133225.324937-4-lvivier@redhat.com>>
     Reviewed-by: Michael S. Tsirkin <mst@redhat.com <mailto:mst@redhat.com>>
     Signed-off-by: Michael S. Tsirkin <mst@redhat.com <mailto:mst@redhat.com>>
------------------
The purpose is for detecting the end of the PCI device hot-unplug. However, we feel the error confusing. How is it possible that a disk "is already in the process of unplug" during the first hot-unplug attempt? So far as I know, the issue was also encountered by libvirt, but they simply ignored it:

https://bugzilla.redhat.com/show_bug.cgi?id=1878659 <https://bugzilla.redhat.com/show_bug.cgi?id=1878659>

Hence, a question is: should we have the line below in  
acpi_pcihp_device_unplug_request_cb()?

    pdev->qdev.pending_deleted_event = true;

It would be great if you as the author could give us a few hints.

Thank you very much for your reply!

Sincerely,

Yu Zhang @ Compute Platform IONOS
03.04.2013




reply via email to

[Prev in Thread] Current Thread [Next in Thread]