Re: [Qemu-devel] live migration vs device assignment (motivation)

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] live migration vs device assignment (motivation)

From:	Yang Zhang
Subject:	Re: [Qemu-devel] live migration vs device assignment (motivation)
Date:	Thu, 10 Dec 2015 21:07:32 +0800
User-agent:	Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0

On 2015/12/10 19:41, Dr. David Alan Gilbert wrote:

* Yang Zhang (address@hidden) wrote:

On 2015/12/10 18:18, Dr. David Alan Gilbert wrote:

* Lan, Tianyu (address@hidden) wrote:

On 12/8/2015 12:50 AM, Michael S. Tsirkin wrote:

I thought about what this is doing at the high level, and I do have some
value in what you are trying to do, but I also think we need to clarify
the motivation a bit more.  What you are saying is not really what the
patches are doing.

And with that clearer understanding of the motivation in mind (assuming
it actually captures a real need), I would also like to suggest some
changes.


Motivation:
Most current solutions for migration with passthough device are based on
the PCI hotplug but it has side affect and can't work for all device.

For NIC device:
PCI hotplug solution can work around Network device migration
via switching VF and PF.

But switching network interface will introduce service down time.

I tested the service down time via putting VF and PV interface
into a bonded interface and ping the bonded interface during plug
and unplug VF.
1) About 100ms when add VF
2) About 30ms when del VF

It also requires guest to do switch configuration. These are hard to
manage and deploy from our customers. To maintain PV performance during
migration, host side also needs to assign a VF to PV device. This
affects scalability.

These factors block SRIOV NIC passthough usage in the cloud service and
OPNFV which require network high performance and stability a lot.


Right, that I'll agree it's hard to do migration of a VM which uses
an SRIOV device; and while I think it should be possible to bond a virtio device
to a VF for networking and then hotplug the SR-IOV device I agree it's hard to 
manage.

For other kind of devices, it's hard to work.
We are also adding migration support for QAT(QuickAssist Technology) device.

QAT device user case introduction.
Server, networking, big data, and storage applications use QuickAssist
Technology to offload servers from handling compute-intensive operations,
such as:
1) Symmetric cryptography functions including cipher operations and
authentication operations
2) Public key functions including RSA, Diffie-Hellman, and elliptic curve
cryptography
3) Compression and decompression functions including DEFLATE and LZS

PCI hotplug will not work for such devices during migration and these
operations will fail when unplug device.


I don't understand that QAT argument; if the device is purely an offload
engine for performance, then why can't you fall back to doing the
same operations in the VM or in QEMU if the card is unavailable?
The tricky bit is dealing with outstanding operations.

So we are trying implementing a new solution which really migrates
device state to target machine and won't affect user during migration
with low service down time.


Right, that's a good aim - the only question is how to do it.

It looks like this is always going to need some device-specific code;
the question I see is whether that's in:
     1) qemu
     2) the host kernel
     3) the guest kernel driver

The objections to this series seem to be that it needs changes to (3);
I can see the worry that the guest kernel driver might not get a chance
to run during the right time in migration and it's painful having to
change every guest driver (although your change is small).

My question is what stage of the migration process do you expect to tell
the guest kernel driver to do this?

     If you do it at the start of the migration, and quiesce the device,
     the migration might take a long time (say 30 minutes) - are you
     intending the device to be quiesced for this long? And where are
     you going to send the traffic?
     If you are, then do you need to do it via this PCI trick, or could
     you just do it via something higher level to quiesce the device.

     Or are you intending to do it just near the end of the migration?
     But then how do we know how long it will take the guest driver to
     respond?


Ideally, it is able to leave guest driver unmodified but it requires the
hypervisor or qemu to aware the device which means we may need a driver in
hypervisor or qemu to handle the device on behalf of guest driver.


Can you answer the question of when do you use your code -
    at the start of migration or
    just before the end?

Tianyu can answer this question. In my initial design, i prefer to putmore modifications in hypervisor and Qemu, and the only involvement fromguest driver is how to restore the state after migration. But I don'tknow the later implementation since i have left Intel.

It would be great if we could avoid changing the guest; but at least your guest
driver changes don't actually seem to be that hardware specific; could your
changes actually be moved to generic PCI level so they could be made
to work for lots of drivers?


It is impossible to use one common solution for all devices unless the PCIE
spec documents it clearly and i think one day it will be there. But before
that, we need some workarounds on guest driver to make it work even it looks
ugly.


Dave


--
best regards
yang

--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



--
best regards
yang

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] live migration vs device assignment (motivation), (continued)

Prev by Date: [Qemu-devel] [PATCH 04/34] scripts/kvm/kvm_stat: Removed unneeded PERF constants
Next by Date: [Qemu-devel] [PATCH v3 1/4] target-tilegx: Add floating point shared functions
Previous by thread: Re: [Qemu-devel] live migration vs device assignment (motivation)
Next by thread: Re: [Qemu-devel] [Qemu-block] [PATCH for-2.5] qcow2: always initialize specific image info
Index(es):
- Date
- Thread