qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
Date: Thu, 30 May 2019 19:06:29 -0400

On Thu, May 30, 2019 at 03:22:10PM -0300, Eduardo Habkost wrote:
> On Thu, May 30, 2019 at 02:09:42PM -0400, Michael S. Tsirkin wrote:
> > On Thu, May 30, 2019 at 07:00:23PM +0100, Dr. David Alan Gilbert wrote:
> > > * Michael S. Tsirkin (address@hidden) wrote:
> > > > On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:
> > > > > Hi David,
> > > > > 
> > > > > sorry for the  delayed reply.
> > > > > 
> > > > > On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
> > > > > > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert 
> > > > > > wrote:
> > > > > > > * Jens Freimann (address@hidden) wrote:
> > > > > > > > +static void virtio_net_primary_plug_timer(void *opaque);
> > > > > > > > +
> > > > > > > >  static void virtio_net_set_link_status(NetClientState *nc)
> > > > > > > >  {
> > > > > > > >      VirtIONet *n = qemu_get_nic_opaque(nc);
> > > > > > > > @@ -786,6 +796,14 @@ static void 
> > > > > > > > virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
> > > > > > > >      } else {
> > > > > > > >          memset(n->vlans, 0xff, MAX_VLAN >> 3);
> > > > > > > >      }
> > > > > > > > +
> > > > > > > > +    if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) {
> > > > > > > > +        atomic_set(&n->primary_should_be_hidden, false);
> > > > > > > > +        if (n->primary_device_timer)
> > > > > > > > +            timer_mod(n->primary_device_timer,
> > > > > > > > +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > > > > > > > +                4000);
> > > > > > > > +    }
> > > > > > > 
> > > > > > > What's this magic timer constant and why?
> > > > > 
> > > > > To be honest it's a leftover from previous versions (before I took
> > > > > over) of the patches and I'm not sure why the timer is there.
> > > > > I removed it and so far see no reason to keep it.
> > > > > 
> > > > > > > 
> > > > > > > >  }
> > > > > > > >
> > > > > > > >  static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
> > > > > > > > @@ -2626,6 +2644,87 @@ void 
> > > > > > > > virtio_net_set_netclient_name(VirtIONet *n, const char *name,
> > > > > > > >      n->netclient_type = g_strdup(type);
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +static void virtio_net_primary_plug_timer(void *opaque)
> > > > > > > > +{
> > > > > > > > +    VirtIONet *n = opaque;
> > > > > > > > +    Error *err = NULL;
> > > > > > > > +
> > > > > > > > +    if (n->primary_device_dict)
> > > > > > > > +        n->primary_device_opts = 
> > > > > > > > qemu_opts_from_qdict(qemu_find_opts("device"),
> > > > > > > > +            n->primary_device_dict, &err);
> > > > > > > > +    if (n->primary_device_opts) {
> > > > > > > > +        n->primary_dev = 
> > > > > > > > qdev_device_add(n->primary_device_opts, &err);
> > > > > > > > +        error_setg(&err, "virtio_net: couldn't plug in primary 
> > > > > > > > device");
> > > > > > > > +        return;
> > > > > > > > +    }
> > > > > > > > +    if (!n->primary_device_dict && err) {
> > > > > > > > +        if (n->primary_device_timer) {
> > > > > > > > +            timer_mod(n->primary_device_timer,
> > > > > > > > +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > > > > > > > +                100);
> > > > > > > 
> > > > > > > same here.
> > > > > 
> > > > > see above
> > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > > +        }
> > > > > > > > +    }
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static void virtio_net_handle_migration_primary(VirtIONet *n,
> > > > > > > > +                                                MigrationState 
> > > > > > > > *s)
> > > > > > > > +{
> > > > > > > > +    Error *err = NULL;
> > > > > > > > +    bool should_be_hidden = 
> > > > > > > > atomic_read(&n->primary_should_be_hidden);
> > > > > > > > +
> > > > > > > > +    n->primary_dev = qdev_find_recursive(sysbus_get_default(),
> > > > > > > > +            n->primary_device_id);
> > > > > > > > +    if (!n->primary_dev) {
> > > > > > > > +        error_setg(&err, "virtio_net: couldn't find primary 
> > > > > > > > device");
> > > > > > > 
> > > > > > > There's something broken with the error handling in this function 
> > > > > > > - the
> > > > > > > 'err' never goes anywhere - I don't think it ever gets printed or
> > > > > > > reported or stops the migration.
> > > > > 
> > > > > yes, I'll fix it.
> > > > > 
> > > > > > > > +    }
> > > > > > > > +    if (migration_in_setup(s) && !should_be_hidden && 
> > > > > > > > n->primary_dev) {
> > > > > > > > +        qdev_unplug(n->primary_dev, &err);
> > > > > > > 
> > > > > > > Not knowing unplug well; can you just explain - is that device 
> > > > > > > hard
> > > > > > > unplugged and it's gone by the time this function returns or is 
> > > > > > > it still
> > > > > > > hanging around for some indeterminate time?
> > > > > 
> > > > > Qemu will trigger an unplug request via pcie attention button in 
> > > > > which case
> > > > > there could be a delay by the guest operating system. We could give 
> > > > > it some
> > > > > amount of time and if nothing happens try surpise removal or handle 
> > > > > the
> > > > > error otherwise.
> > > > > 
> > > > > 
> > > > > regards,
> > > > > Jens
> > > > 
> > > > That's a subject for another day. Let's get the basic thing
> > > > working.
> > > 
> > > Well no, we need to know this thing isn't going to hang in the migration
> > > setup phase, or if it does how we recover.
> > 
> > 
> > This thing is *supposed* to be stuck in migration startup phase
> > if guest is malicious.
> > 
> > If migration does not progress management needs
> > a way to detect this and cancel.
> > 
> > Some more documentation about how this is supposed to happen
> > would be helpful.
> 
> Do we have confirmation from libvirt developers that this would
> be a reasonable API for them?
> 
> 
> > >  This patch series is very
> > > odd precisely because it's trying to do the unplug itself in the
> > > migration phase rather than let the management layer do it - so unless
> > > it's nailed down how to make sure that's really really bullet proof
> > > then we've got to go back and ask the question about whether we should
> > > really fix it so it can be done by the management layer.
> > > 
> > > Dave
> > 
> > management already said they can't because files get closed and
> > resources freed on unplug and so they might not be able to re-add device
> > on migration failure. We do it in migration because that is
> > where failures can happen and we can recover.
> 
> We are capable of providing an API to libvirt where files won't
> get closed when a device is unplugged, if necessary.
> 
> This might become necessary if libvirt or management software
> developers tell us the interface we are providing is not going to
> work for them.
> 
> -- 
> Eduardo

Yes. It's just lots of extremely low level interfaces
and all rather pointless.

And down the road extensions like surprise removal support will make it
all cleaner and more transparent. Floating things up to libvirt means
all these low level details will require more and more hacks.

-- 
MST



reply via email to

[Prev in Thread] Current Thread [Next in Thread]