Re: [RFC v2 11/13] vdpa: add vdpa net migration state notifier

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC v2 11/13] vdpa: add vdpa net migration state notifier

From:	Si-Wei Liu
Subject:	Re: [RFC v2 11/13] vdpa: add vdpa net migration state notifier
Date:	Wed, 1 Feb 2023 17:52:47 -0800
User-agent:	Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.6.1



On 1/12/2023 9:24 AM, Eugenio Pérez wrote:

This allows net to restart the device backend to configure SVQ on it.

Ideally, these changes should not be net specific. However, the vdpa net
backend is the one with enough knowledge to configure everything because
of some reasons:
* Queues might need to be shadowed or not depending on its kind (control
   vs data).
* Queues need to share the same map translations (iova tree).

Because of that it is cleaner to restart the whole net backend and
configure again as expected, similar to how vhost-kernel moves between
userspace and passthrough.

If more kinds of devices need dynamic switching to SVQ we can create a
callback struct like VhostOps and move most of the code there.
VhostOps cannot be reused since all vdpa backend share them, and to
personalize just for networking would be too heavy.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
  net/vhost-vdpa.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++++
  1 file changed, 84 insertions(+)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 5d7ad6e4d7..f38532b1df 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -26,6 +26,8 @@
  #include <err.h>
  #include "standard-headers/linux/virtio_net.h"
  #include "monitor/monitor.h"
+#include "migration/migration.h"
+#include "migration/misc.h"
  #include "migration/blocker.h"
  #include "hw/virtio/vhost.h"

@@ -33,6 +35,7 @@

  typedef struct VhostVDPAState {
      NetClientState nc;
      struct vhost_vdpa vhost_vdpa;
+    Notifier migration_state;
      Error *migration_blocker;
      VHostNetState *vhost_net;

@@ -243,10 +246,86 @@ static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)

      return DO_UPCAST(VhostVDPAState, nc, nc0);
  }

+static void vhost_vdpa_net_log_global_enable(VhostVDPAState *s, bool enable)

+{
+    struct vhost_vdpa *v = &s->vhost_vdpa;
+    VirtIONet *n;
+    VirtIODevice *vdev;
+    int data_queue_pairs, cvq, r;
+    NetClientState *peer;
+
+    /* We are only called on the first data vqs and only if x-svq is not set */
+    if (s->vhost_vdpa.shadow_vqs_enabled == enable) {
+        return;
+    }
+
+    vdev = v->dev->vdev;
+    n = VIRTIO_NET(vdev);
+    if (!n->vhost_started) {
+        return;
+    }
+
+    if (enable) {
+        ioctl(v->device_fd, VHOST_VDPA_SUSPEND);
+    }
+    data_queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
+    cvq = virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ) ?
+                                  n->max_ncs - n->max_queue_pairs : 0;
+    vhost_net_stop(vdev, n->nic->ncs, data_queue_pairs, cvq);
+
+    peer = s->nc.peer;
+    for (int i = 0; i < data_queue_pairs + cvq; i++) {
+        VhostVDPAState *vdpa_state;
+        NetClientState *nc;
+
+        if (i < data_queue_pairs) {
+            nc = qemu_get_peer(peer, i);
+        } else {
+            nc = qemu_get_peer(peer, n->max_queue_pairs);
+        }
+
+        vdpa_state = DO_UPCAST(VhostVDPAState, nc, nc);
+        vdpa_state->vhost_vdpa.shadow_data = enable;
+
+        if (i < data_queue_pairs) {
+            /* Do not override CVQ shadow_vqs_enabled */
+            vdpa_state->vhost_vdpa.shadow_vqs_enabled = enable;
+        }
+    }
+
+    r = vhost_net_start(vdev, n->nic->ncs, data_queue_pairs, cvq);

As the first revision, this method (vhost_net_stop followed byvhost_net_start) should be fine for software vhost-vdpa backend for e.g.vp_vdpa and vdpa_sim_net. However, I would like to get your attentionthat this method implies substantial blackout time for mode switching onreal hardware - get a full cycle of device reset of getting memorymappings torn down, unpin & repin same set of pages, and set up newmapping would take very significant amount of time, especially for alarge VM. Maybe we can do:

1) replace reset with the RESUME feature that was just added to thevhost-vdpa ioctls in kernel2) add new vdpa ioctls to allow iova range rebound to new virtualaddress for QEMU's shadow vq or back to device's vq3) use a light-weighted sequence of suspend+rebind+resume to switch modeon the fly instead of getting through the whole reset+restart cycle

I suspect the same idea could even be used to address high livemigration downtime seen on hardware vdpa device. What do you think?


Thanks,
-Siwei

+    if (unlikely(r < 0)) {
+        error_report("unable to start vhost net: %s(%d)", g_strerror(-r), -r);
+    }
+}
+
+static void vdpa_net_migration_state_notifier(Notifier *notifier, void *data)
+{
+    MigrationState *migration = data;
+    VhostVDPAState *s = container_of(notifier, VhostVDPAState,
+                                     migration_state);
+
+    switch (migration->state) {
+    case MIGRATION_STATUS_SETUP:
+        vhost_vdpa_net_log_global_enable(s, true);
+        return;
+
+    case MIGRATION_STATUS_CANCELLING:
+    case MIGRATION_STATUS_CANCELLED:
+    case MIGRATION_STATUS_FAILED:
+        vhost_vdpa_net_log_global_enable(s, false);
+        return;
+    };
+}
+
  static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
  {
      struct vhost_vdpa *v = &s->vhost_vdpa;

+ if (v->feature_log) {

+        add_migration_state_change_notifier(&s->migration_state);
+    }
+
      if (v->shadow_vqs_enabled) {
          v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
                                             v->iova_range.last);
@@ -280,6 +359,10 @@ static void vhost_vdpa_net_client_stop(NetClientState *nc)

assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);+ if (s->vhost_vdpa.index == 0 && s->vhost_vdpa.feature_log) {

+        remove_migration_state_change_notifier(&s->migration_state);
+    }
+
      dev = s->vhost_vdpa.dev;
      if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
          g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
@@ -767,6 +850,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState 
*peer,
      s->vhost_vdpa.device_fd = vdpa_device_fd;
      s->vhost_vdpa.index = queue_pair_index;
      s->always_svq = svq;
+    s->migration_state.notify = vdpa_net_migration_state_notifier;
      s->vhost_vdpa.shadow_vqs_enabled = svq;
      s->vhost_vdpa.iova_range = iova_range;
      s->vhost_vdpa.shadow_data = svq;

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [RFC v2 11/13] vdpa: add vdpa net migration state notifier, Si-Wei Liu <=
- Re: [RFC v2 11/13] vdpa: add vdpa net migration state notifier, Eugenio Perez Martin, 2023/02/02
  - Re: [RFC v2 11/13] vdpa: add vdpa net migration state notifier, Si-Wei Liu, 2023/02/03
    - Re: [RFC v2 11/13] vdpa: add vdpa net migration state notifier, Eugenio Perez Martin, 2023/02/13
    - Re: [RFC v2 11/13] vdpa: add vdpa net migration state notifier, Si-Wei Liu, 2023/02/13
    - Re: [RFC v2 11/13] vdpa: add vdpa net migration state notifier, Eugenio Perez Martin, 2023/02/14
- RE: [RFC v2 11/13] vdpa: add vdpa net migration state notifier, Eli Cohen, 2023/02/12

Prev by Date: Re: [PATCH 02/14] linux-user/sparc: Tidy syscall trap
Next by Date: Re: [PATCH v5 0/3] riscv_load_fdt() semantics change
Previous by thread: Re: [RFC v2 00/13] Dinamycally switch to vhost shadow virtqueues at vdpa net migration
Next by thread: Re: [RFC v2 11/13] vdpa: add vdpa net migration state notifier
Index(es):
- Date
- Thread