qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC v2 11/13] vdpa: add vdpa net migration state notifier


From: Eugenio Perez Martin
Subject: Re: [RFC v2 11/13] vdpa: add vdpa net migration state notifier
Date: Thu, 2 Feb 2023 16:28:13 +0100

On Thu, Feb 2, 2023 at 2:53 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 1/12/2023 9:24 AM, Eugenio Pérez wrote:
> > This allows net to restart the device backend to configure SVQ on it.
> >
> > Ideally, these changes should not be net specific. However, the vdpa net
> > backend is the one with enough knowledge to configure everything because
> > of some reasons:
> > * Queues might need to be shadowed or not depending on its kind (control
> >    vs data).
> > * Queues need to share the same map translations (iova tree).
> >
> > Because of that it is cleaner to restart the whole net backend and
> > configure again as expected, similar to how vhost-kernel moves between
> > userspace and passthrough.
> >
> > If more kinds of devices need dynamic switching to SVQ we can create a
> > callback struct like VhostOps and move most of the code there.
> > VhostOps cannot be reused since all vdpa backend share them, and to
> > personalize just for networking would be too heavy.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   net/vhost-vdpa.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 84 insertions(+)
> >
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index 5d7ad6e4d7..f38532b1df 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -26,6 +26,8 @@
> >   #include <err.h>
> >   #include "standard-headers/linux/virtio_net.h"
> >   #include "monitor/monitor.h"
> > +#include "migration/migration.h"
> > +#include "migration/misc.h"
> >   #include "migration/blocker.h"
> >   #include "hw/virtio/vhost.h"
> >
> > @@ -33,6 +35,7 @@
> >   typedef struct VhostVDPAState {
> >       NetClientState nc;
> >       struct vhost_vdpa vhost_vdpa;
> > +    Notifier migration_state;
> >       Error *migration_blocker;
> >       VHostNetState *vhost_net;
> >
> > @@ -243,10 +246,86 @@ static VhostVDPAState 
> > *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
> >       return DO_UPCAST(VhostVDPAState, nc, nc0);
> >   }
> >
> > +static void vhost_vdpa_net_log_global_enable(VhostVDPAState *s, bool 
> > enable)
> > +{
> > +    struct vhost_vdpa *v = &s->vhost_vdpa;
> > +    VirtIONet *n;
> > +    VirtIODevice *vdev;
> > +    int data_queue_pairs, cvq, r;
> > +    NetClientState *peer;
> > +
> > +    /* We are only called on the first data vqs and only if x-svq is not 
> > set */
> > +    if (s->vhost_vdpa.shadow_vqs_enabled == enable) {
> > +        return;
> > +    }
> > +
> > +    vdev = v->dev->vdev;
> > +    n = VIRTIO_NET(vdev);
> > +    if (!n->vhost_started) {
> > +        return;
> > +    }
> > +
> > +    if (enable) {
> > +        ioctl(v->device_fd, VHOST_VDPA_SUSPEND);
> > +    }
> > +    data_queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
> > +    cvq = virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ) ?
> > +                                  n->max_ncs - n->max_queue_pairs : 0;
> > +    vhost_net_stop(vdev, n->nic->ncs, data_queue_pairs, cvq);
> > +
> > +    peer = s->nc.peer;
> > +    for (int i = 0; i < data_queue_pairs + cvq; i++) {
> > +        VhostVDPAState *vdpa_state;
> > +        NetClientState *nc;
> > +
> > +        if (i < data_queue_pairs) {
> > +            nc = qemu_get_peer(peer, i);
> > +        } else {
> > +            nc = qemu_get_peer(peer, n->max_queue_pairs);
> > +        }
> > +
> > +        vdpa_state = DO_UPCAST(VhostVDPAState, nc, nc);
> > +        vdpa_state->vhost_vdpa.shadow_data = enable;
> > +
> > +        if (i < data_queue_pairs) {
> > +            /* Do not override CVQ shadow_vqs_enabled */
> > +            vdpa_state->vhost_vdpa.shadow_vqs_enabled = enable;
> > +        }
> > +    }
> > +
> > +    r = vhost_net_start(vdev, n->nic->ncs, data_queue_pairs, cvq);
> As the first revision, this method (vhost_net_stop followed by
> vhost_net_start) should be fine for software vhost-vdpa backend for e.g.
> vp_vdpa and vdpa_sim_net. However, I would like to get your attention
> that this method implies substantial blackout time for mode switching on
> real hardware - get a full cycle of device reset of getting memory
> mappings torn down, unpin & repin same set of pages, and set up new
> mapping would take very significant amount of time, especially for a
> large VM. Maybe we can do:
>

Right, I think this is something that deserves optimization in the future.

Note that we must replace the mappings anyway, with all passthrough
queues stopped. This is because SVQ vrings are not in the guest space.
The pin can be skipped though, I think that's a low hand fruit here.

If any, we can track guest's IOVA and add SVQ vrings in a hole. If
guest's IOVA layout changes, we can translate it then to a new
location. That way we only need one map operation in the worst case.
I'm omitting the lookup time here, but I still should be worth it.

But as you mention I think it is not worth complicating this series
and we can think about it on top. We can start building it on top of
your suggestions for sure.

> 1) replace reset with the RESUME feature that was just added to the
> vhost-vdpa ioctls in kernel

We cannot change vring addresses just with a SUSPEND / RESUME.

We could do it with the VIRTIO_F_RING_RESET feature though. Would it
be advantageous to the device?

> 2) add new vdpa ioctls to allow iova range rebound to new virtual
> address for QEMU's shadow vq or back to device's vq

Actually, if the device supports ASID we can allocate ASID 1 to that
purpose. At this moment only CVQ vrings and control buffers are there
when the device is passthrough.

But this doesn't solve the problem if we need to send all SVQ
translation to the device on-chip IOMMU, doesn't it? We must clear all
of it and send the new one to the device anyway.

> 3) use a light-weighted sequence of suspend+rebind+resume to switch mode
> on the fly instead of getting through the whole reset+restart cycle
>

I think this is the same as 1, isn't it?

> I suspect the same idea could even be used to address high live
> migration downtime seen on hardware vdpa device. What do you think?
>

I think this is a great start for sure! Some questions:
a) Is the time on reprogramming on-chip IOMMU comparable to program
regular IOMMU? If it is the case it should be easier to find vdpa
devices with support for _F_RESET soon.
b) Not to merge on master, but it is possible to add an artificial
delay on vdpa_sim that simulates the properties of the delay of IOMMU?
In that line, have you observed if it is linear with the size of the
memory, with the number of maps, other factors..?

Thanks!

> Thanks,
> -Siwei
>
> > +    if (unlikely(r < 0)) {
> > +        error_report("unable to start vhost net: %s(%d)", g_strerror(-r), 
> > -r);
> > +    }
> > +}
> > +
> > +static void vdpa_net_migration_state_notifier(Notifier *notifier, void 
> > *data)
> > +{
> > +    MigrationState *migration = data;
> > +    VhostVDPAState *s = container_of(notifier, VhostVDPAState,
> > +                                     migration_state);
> > +
> > +    switch (migration->state) {
> > +    case MIGRATION_STATUS_SETUP:
> > +        vhost_vdpa_net_log_global_enable(s, true);
> > +        return;
> > +
> > +    case MIGRATION_STATUS_CANCELLING:
> > +    case MIGRATION_STATUS_CANCELLED:
> > +    case MIGRATION_STATUS_FAILED:
> > +        vhost_vdpa_net_log_global_enable(s, false);
> > +        return;
> > +    };
> > +}
> > +
> >   static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
> >   {
> >       struct vhost_vdpa *v = &s->vhost_vdpa;
> >
> > +    if (v->feature_log) {
> > +        add_migration_state_change_notifier(&s->migration_state);
> > +    }
> > +
> >       if (v->shadow_vqs_enabled) {
> >           v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
> >                                              v->iova_range.last);
> > @@ -280,6 +359,10 @@ static void vhost_vdpa_net_client_stop(NetClientState 
> > *nc)
> >
> >       assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> >
> > +    if (s->vhost_vdpa.index == 0 && s->vhost_vdpa.feature_log) {
> > +        remove_migration_state_change_notifier(&s->migration_state);
> > +    }
> > +
> >       dev = s->vhost_vdpa.dev;
> >       if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> >           g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> > @@ -767,6 +850,7 @@ static NetClientState 
> > *net_vhost_vdpa_init(NetClientState *peer,
> >       s->vhost_vdpa.device_fd = vdpa_device_fd;
> >       s->vhost_vdpa.index = queue_pair_index;
> >       s->always_svq = svq;
> > +    s->migration_state.notify = vdpa_net_migration_state_notifier;
> >       s->vhost_vdpa.shadow_vqs_enabled = svq;
> >       s->vhost_vdpa.iova_range = iova_range;
> >       s->vhost_vdpa.shadow_data = svq;
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]