[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [question] VFIO Device Migration: The vCPU may be paused during vfio
From: |
Tian, Kevin |
Subject: |
RE: [question] VFIO Device Migration: The vCPU may be paused during vfio device DMA in iommu nested stage mode && vSVA |
Date: |
Fri, 24 Sep 2021 06:47:48 +0000 |
> From: Kunkun Jiang <jiangkunkun@huawei.com>
> Sent: Friday, September 24, 2021 2:19 PM
>
> Hi all,
>
> I encountered a problem in vfio device migration test. The
> vCPU may be paused during vfio-pci DMA in iommu nested
> stage mode && vSVA. This may lead to migration fail and
> other problems related to device hardware and driver
> implementation.
>
> It may be a bit early to discuss this issue, after all, the iommu
> nested stage mode and vSVA are not yet mature. But judging
> from the current implementation, we will definitely encounter
> this problem in the future.
Yes, this is a known limitation to support migration with vSVA.
>
> This is the current process of vSVA processing translation fault
> in iommu nested stage mode (take SMMU as an example):
>
> guest os 4.handle translation fault 5.send CMD_RESUME to vSMMU
>
>
> qemu 3.inject fault into guest os 6.deliver response to
> host os
> (vfio/vsmmu)
>
>
> host os 2.notify the qemu 7.send CMD_RESUME to SMMU
> (vfio/smmu)
>
>
> SMMU 1.address translation fault 8.retry or
> terminate
>
> The order is 1--->8.
>
> Currently, qemu may pause vCPU at any step. It is possible to
> pause vCPU at step 1-5, that is, in a DMA. This may lead to
> migration fail and other problems related to device hardware
> and driver implementation. For example, the device status
> cannot be changed from RUNNING && SAVING to SAVING,
> because the device DMA is not over.
>
> As far as i can see, vCPU should not be paused during a device
> IO process, such as DMA. However, currently live migration
> does not pay attention to the state of vfio device when pausing
> the vCPU. And if the vCPU is not paused, the vfio device is
> always running. This looks like a *deadlock*.
Basically this requires:
1) stopping vCPU after stopping device (could selectively enable
this sequence for vSVA);
2) when stopping device, the driver should block new requests
from vCPU (queued to a pending list) and then drain all in-fly
requests including faults;
* to block this further requires switching from fast-path to
slow trap-emulation path for the cmd portal before stopping
the device;
3) save the pending requests in the vm image and replay them
after the vm is resumed;
* finally disable blocking by switching back to the fast-path for
the cmd portal;
>
> Do you have any ideas to solve this problem?
> Looking forward to your replay.
>
We verified above flow can work in our internal POC.
Thanks
Kevin