qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [RFC 0/1] Add live migration support to the PVRDMA device


From: Sukrit Bhatnagar
Subject: [Qemu-devel] [RFC 0/1] Add live migration support to the PVRDMA device
Date: Fri, 21 Jun 2019 20:15:40 +0530

Hi,

I am a GSoC participant, trying to implement live migration for the
pvrdma device with help from my mentors Marcel and Yuval.
My current task is to save and load the various addresses that the
device uses for DMA mapping. We will be adding the device state into
live migration, incrementally. As the first step in the implementation,
we are performing migration to the same host. This will save us from
many complexities, such as GID change, at this stage, and we will
address migration across hosts at a later point when same-host migration
works.

Currently, the save and load logic uses SaveVMHandlers, which is the
legcay way, and will be ported to VMStateDescription once the
existing issues are solved.

This RFC is meant to request suggestions on the things which are
working and for help on the things which are not.


What is working:

* pvrdma device is getting initialized in a VM, its GID entry is
  getting added to the host, and rc_pingpong is successful between
  two such VMs. This is when libvirt is used to launch the VMs.

* The dma, cmd_slot_dma and resp_slot_dma addresses are saved at the
  source and loaded properly in the destination upon migration. That is,
  the values loaded at the dest during migration are the same as the
  ones saved.

  `dma` is provided by the guest device when it writes to BAR1, stored
  in dev->dsr_info.dma. A DSR is created on mapping to this address.
  `cmd_slot_dma` and `resp_slot_dma` are the dma addresses of the command
  and response buffers, respectively, which are provided by the guest
  through the DSR.

* The DSR successfully (re)maps to the dma address loaded from
  migration at the dest.


What is not working:

* In the pvrdma_load() logic, the mapping to DSR is successful at dest.
  But the mapping for cmd and resp slots fails.
  rdma_pci_dma_map() eventually calls address_space_map(). Inside the
  latter, a global BounceBuffer bounce is checked to see if it is in use
  (the atomic_xchg() primitive).
  At the dest, it is in use and the dma remapping fails there, which
  fails the whole migration process. Essentially, I am looking for a
  way to remap guest physical address after a live migration (to the
  same host). Any tips on avoiding the BounceBuffer will also be great.

  I have also tried unmapping the cmd and resp slots at the source before
  saving the dma addresses in pvrdma_save(), but the mapping fails anyway.

* It seems that vmxnet3 migration itself is not working properly, at least
  for me. The pvrdma device depends on it, vmxnet3 is function 0 and pvrdma
  is function 1. This is happening even for a build of unmodified code from
  the master branch.
  After migration, the network connectivity is lost at destination.
  Things are fine at the source before migration.
  This is the command I am using at src:

  sudo /home/skrtbhtngr/qemu/build/x86_64-softmmu/qemu-system-x86_64 \
    -enable-kvm \
    -m 2G -smp cpus=2 \
    -hda /home/skrtbhtngr/fedora.img \
    -netdev tap,id=hostnet0 \
    -device vmxnet3,netdev=hostnet0,id=net0,mac=52:54:00:99:ff:bc \
    -monitor telnet:127.0.0.1:4444,server,nowait \
    -trace events=/home/skrtbhtngr/trace-events \
    -vnc 0.0.0.0:0

  Similar command is used for the dest. Currently, I am trying
  same-host migration for testing purpose, without the pvrdma device.
  Two tap interfaces, for src and dest were created successfully at
  the host. Kernel logs:
  ...
  br0: port 2(tap0) entered forwarding state
  ...
  br0: port 3(tap1) entered forwarding state

  tcpdump at the dest reports only outgoing ARP packets, which ask
  for gateway: "ARP, Request who-has _gateway tell guest1".

  Tried using user (slirp) as the network backend, but no luck.
  
  Also tried git bisect to find the issue using a working commit (given
  by Marcel), but it turns out that it is very old and I faced build
  errors one after another.

  Please note that e1000 live migration is working fine in the same setup.

* Since we are aiming at trying on same-host migration first, I cannot
  use libvirt as it does not allow this. Currently, I am running the
  VMs using qemu-system commands. But libvirt is needed to add the GID
  entry of the guest device in the host. I am looking for a workaround,
  if that is possible at all.
  I started a thread few days ago for the same on libvirt-users:
  https://www.redhat.com/archives/libvirt-users/2019-June/msg00011.html


Sukrit Bhatnagar (1):
  hw/pvrdma: Add live migration support

 hw/rdma/vmw/pvrdma_main.c | 56 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

-- 
2.21.0




reply via email to

[Prev in Thread] Current Thread [Next in Thread]