qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
Date: Thu, 30 May 2019 14:09:42 -0400

On Thu, May 30, 2019 at 07:00:23PM +0100, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (address@hidden) wrote:
> > On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:
> > > Hi David,
> > > 
> > > sorry for the  delayed reply.
> > > 
> > > On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
> > > > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert wrote:
> > > > > * Jens Freimann (address@hidden) wrote:
> > > > > > +static void virtio_net_primary_plug_timer(void *opaque);
> > > > > > +
> > > > > >  static void virtio_net_set_link_status(NetClientState *nc)
> > > > > >  {
> > > > > >      VirtIONet *n = qemu_get_nic_opaque(nc);
> > > > > > @@ -786,6 +796,14 @@ static void 
> > > > > > virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
> > > > > >      } else {
> > > > > >          memset(n->vlans, 0xff, MAX_VLAN >> 3);
> > > > > >      }
> > > > > > +
> > > > > > +    if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) {
> > > > > > +        atomic_set(&n->primary_should_be_hidden, false);
> > > > > > +        if (n->primary_device_timer)
> > > > > > +            timer_mod(n->primary_device_timer,
> > > > > > +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > > > > > +                4000);
> > > > > > +    }
> > > > > 
> > > > > What's this magic timer constant and why?
> > > 
> > > To be honest it's a leftover from previous versions (before I took
> > > over) of the patches and I'm not sure why the timer is there.
> > > I removed it and so far see no reason to keep it.
> > > 
> > > > > 
> > > > > >  }
> > > > > >
> > > > > >  static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
> > > > > > @@ -2626,6 +2644,87 @@ void virtio_net_set_netclient_name(VirtIONet 
> > > > > > *n, const char *name,
> > > > > >      n->netclient_type = g_strdup(type);
> > > > > >  }
> > > > > >
> > > > > > +static void virtio_net_primary_plug_timer(void *opaque)
> > > > > > +{
> > > > > > +    VirtIONet *n = opaque;
> > > > > > +    Error *err = NULL;
> > > > > > +
> > > > > > +    if (n->primary_device_dict)
> > > > > > +        n->primary_device_opts = 
> > > > > > qemu_opts_from_qdict(qemu_find_opts("device"),
> > > > > > +            n->primary_device_dict, &err);
> > > > > > +    if (n->primary_device_opts) {
> > > > > > +        n->primary_dev = qdev_device_add(n->primary_device_opts, 
> > > > > > &err);
> > > > > > +        error_setg(&err, "virtio_net: couldn't plug in primary 
> > > > > > device");
> > > > > > +        return;
> > > > > > +    }
> > > > > > +    if (!n->primary_device_dict && err) {
> > > > > > +        if (n->primary_device_timer) {
> > > > > > +            timer_mod(n->primary_device_timer,
> > > > > > +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > > > > > +                100);
> > > > > 
> > > > > same here.
> > > 
> > > see above
> > > 
> > > > > 
> > > > > 
> > > > > > +        }
> > > > > > +    }
> > > > > > +}
> > > > > > +
> > > > > > +static void virtio_net_handle_migration_primary(VirtIONet *n,
> > > > > > +                                                MigrationState *s)
> > > > > > +{
> > > > > > +    Error *err = NULL;
> > > > > > +    bool should_be_hidden = 
> > > > > > atomic_read(&n->primary_should_be_hidden);
> > > > > > +
> > > > > > +    n->primary_dev = qdev_find_recursive(sysbus_get_default(),
> > > > > > +            n->primary_device_id);
> > > > > > +    if (!n->primary_dev) {
> > > > > > +        error_setg(&err, "virtio_net: couldn't find primary 
> > > > > > device");
> > > > > 
> > > > > There's something broken with the error handling in this function - 
> > > > > the
> > > > > 'err' never goes anywhere - I don't think it ever gets printed or
> > > > > reported or stops the migration.
> > > 
> > > yes, I'll fix it.
> > > 
> > > > > > +    }
> > > > > > +    if (migration_in_setup(s) && !should_be_hidden && 
> > > > > > n->primary_dev) {
> > > > > > +        qdev_unplug(n->primary_dev, &err);
> > > > > 
> > > > > Not knowing unplug well; can you just explain - is that device hard
> > > > > unplugged and it's gone by the time this function returns or is it 
> > > > > still
> > > > > hanging around for some indeterminate time?
> > > 
> > > Qemu will trigger an unplug request via pcie attention button in which 
> > > case
> > > there could be a delay by the guest operating system. We could give it 
> > > some
> > > amount of time and if nothing happens try surpise removal or handle the
> > > error otherwise.
> > > 
> > > 
> > > regards,
> > > Jens
> > 
> > That's a subject for another day. Let's get the basic thing
> > working.
> 
> Well no, we need to know this thing isn't going to hang in the migration
> setup phase, or if it does how we recover.


This thing is *supposed* to be stuck in migration startup phase
if guest is malicious.

If migration does not progress management needs
a way to detect this and cancel.

Some more documentation about how this is supposed to happen
would be helpful.

>  This patch series is very
> odd precisely because it's trying to do the unplug itself in the
> migration phase rather than let the management layer do it - so unless
> it's nailed down how to make sure that's really really bullet proof
> then we've got to go back and ask the question about whether we should
> really fix it so it can be done by the management layer.
> 
> Dave

management already said they can't because files get closed and
resources freed on unplug and so they might not be able to re-add device
on migration failure. We do it in migration because that is
where failures can happen and we can recover.

> > -- 
> > MST
> --
> Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]