qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration suppo


From: Lan, Tianyu
Subject: Re: [Qemu-devel] [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for SRIOV NIC
Date: Tue, 1 Dec 2015 23:04:31 +0800
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0



On 12/1/2015 12:07 AM, Alexander Duyck wrote:
They can only be corrected if the underlying assumptions are correct
and they aren't.  Your solution would have never worked correctly.
The problem is you assume you can keep the device running when you are
migrating and you simply cannot.  At some point you will always have
to stop the device in order to complete the migration, and you cannot
stop it before you have stopped your page tracking mechanism.  So
unless the platform has an IOMMU that is somehow taking part in the
dirty page tracking you will not be able to stop the guest and then
the device, it will have to be the device and then the guest.

>Doing suspend and resume() may help to do migration easily but some
>devices requires low service down time. Especially network and I got
>that some cloud company promised less than 500ms network service downtime.
Honestly focusing on the downtime is getting the cart ahead of the
horse.  First you need to be able to do this without corrupting system
memory and regardless of the state of the device.  You haven't even
gotten to that state yet.  Last I knew the device had to be up in
order for your migration to even work.

I think the issue is that the content of rx package delivered to stack maybe changed during migration because the piece of memory won't be migrated to new machine. This may confuse applications or stack. Current dummy write solution can ensure the content of package won't change after doing dummy write while the content maybe not received data if migration happens before that point. We can recheck the content via checksum or crc in the protocol after dummy write to ensure the content is what VF received. I think stack has already done such checks and the package will be abandoned if failed to pass through the check.

Another way is to tell all memory driver are using to Qemu and let Qemu to migrate these memory after stopping VCPU and the device. This seems safe but implementation maybe complex.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]