qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support


From: Eduardo Habkost
Subject: Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
Date: Thu, 30 May 2019 15:22:10 -0300
User-agent: Mutt/1.11.3 (2019-02-01)

On Thu, May 30, 2019 at 02:09:42PM -0400, Michael S. Tsirkin wrote:
> On Thu, May 30, 2019 at 07:00:23PM +0100, Dr. David Alan Gilbert wrote:
> > * Michael S. Tsirkin (address@hidden) wrote:
> > > On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:
> > > > Hi David,
> > > > 
> > > > sorry for the  delayed reply.
> > > > 
> > > > On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
> > > > > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert 
> > > > > wrote:
> > > > > > * Jens Freimann (address@hidden) wrote:
> > > > > > > +static void virtio_net_primary_plug_timer(void *opaque);
> > > > > > > +
> > > > > > >  static void virtio_net_set_link_status(NetClientState *nc)
> > > > > > >  {
> > > > > > >      VirtIONet *n = qemu_get_nic_opaque(nc);
> > > > > > > @@ -786,6 +796,14 @@ static void 
> > > > > > > virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
> > > > > > >      } else {
> > > > > > >          memset(n->vlans, 0xff, MAX_VLAN >> 3);
> > > > > > >      }
> > > > > > > +
> > > > > > > +    if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) {
> > > > > > > +        atomic_set(&n->primary_should_be_hidden, false);
> > > > > > > +        if (n->primary_device_timer)
> > > > > > > +            timer_mod(n->primary_device_timer,
> > > > > > > +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > > > > > > +                4000);
> > > > > > > +    }
> > > > > > 
> > > > > > What's this magic timer constant and why?
> > > > 
> > > > To be honest it's a leftover from previous versions (before I took
> > > > over) of the patches and I'm not sure why the timer is there.
> > > > I removed it and so far see no reason to keep it.
> > > > 
> > > > > > 
> > > > > > >  }
> > > > > > >
> > > > > > >  static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
> > > > > > > @@ -2626,6 +2644,87 @@ void 
> > > > > > > virtio_net_set_netclient_name(VirtIONet *n, const char *name,
> > > > > > >      n->netclient_type = g_strdup(type);
> > > > > > >  }
> > > > > > >
> > > > > > > +static void virtio_net_primary_plug_timer(void *opaque)
> > > > > > > +{
> > > > > > > +    VirtIONet *n = opaque;
> > > > > > > +    Error *err = NULL;
> > > > > > > +
> > > > > > > +    if (n->primary_device_dict)
> > > > > > > +        n->primary_device_opts = 
> > > > > > > qemu_opts_from_qdict(qemu_find_opts("device"),
> > > > > > > +            n->primary_device_dict, &err);
> > > > > > > +    if (n->primary_device_opts) {
> > > > > > > +        n->primary_dev = qdev_device_add(n->primary_device_opts, 
> > > > > > > &err);
> > > > > > > +        error_setg(&err, "virtio_net: couldn't plug in primary 
> > > > > > > device");
> > > > > > > +        return;
> > > > > > > +    }
> > > > > > > +    if (!n->primary_device_dict && err) {
> > > > > > > +        if (n->primary_device_timer) {
> > > > > > > +            timer_mod(n->primary_device_timer,
> > > > > > > +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > > > > > > +                100);
> > > > > > 
> > > > > > same here.
> > > > 
> > > > see above
> > > > 
> > > > > > 
> > > > > > 
> > > > > > > +        }
> > > > > > > +    }
> > > > > > > +}
> > > > > > > +
> > > > > > > +static void virtio_net_handle_migration_primary(VirtIONet *n,
> > > > > > > +                                                MigrationState 
> > > > > > > *s)
> > > > > > > +{
> > > > > > > +    Error *err = NULL;
> > > > > > > +    bool should_be_hidden = 
> > > > > > > atomic_read(&n->primary_should_be_hidden);
> > > > > > > +
> > > > > > > +    n->primary_dev = qdev_find_recursive(sysbus_get_default(),
> > > > > > > +            n->primary_device_id);
> > > > > > > +    if (!n->primary_dev) {
> > > > > > > +        error_setg(&err, "virtio_net: couldn't find primary 
> > > > > > > device");
> > > > > > 
> > > > > > There's something broken with the error handling in this function - 
> > > > > > the
> > > > > > 'err' never goes anywhere - I don't think it ever gets printed or
> > > > > > reported or stops the migration.
> > > > 
> > > > yes, I'll fix it.
> > > > 
> > > > > > > +    }
> > > > > > > +    if (migration_in_setup(s) && !should_be_hidden && 
> > > > > > > n->primary_dev) {
> > > > > > > +        qdev_unplug(n->primary_dev, &err);
> > > > > > 
> > > > > > Not knowing unplug well; can you just explain - is that device hard
> > > > > > unplugged and it's gone by the time this function returns or is it 
> > > > > > still
> > > > > > hanging around for some indeterminate time?
> > > > 
> > > > Qemu will trigger an unplug request via pcie attention button in which 
> > > > case
> > > > there could be a delay by the guest operating system. We could give it 
> > > > some
> > > > amount of time and if nothing happens try surpise removal or handle the
> > > > error otherwise.
> > > > 
> > > > 
> > > > regards,
> > > > Jens
> > > 
> > > That's a subject for another day. Let's get the basic thing
> > > working.
> > 
> > Well no, we need to know this thing isn't going to hang in the migration
> > setup phase, or if it does how we recover.
> 
> 
> This thing is *supposed* to be stuck in migration startup phase
> if guest is malicious.
> 
> If migration does not progress management needs
> a way to detect this and cancel.
> 
> Some more documentation about how this is supposed to happen
> would be helpful.

Do we have confirmation from libvirt developers that this would
be a reasonable API for them?


> >  This patch series is very
> > odd precisely because it's trying to do the unplug itself in the
> > migration phase rather than let the management layer do it - so unless
> > it's nailed down how to make sure that's really really bullet proof
> > then we've got to go back and ask the question about whether we should
> > really fix it so it can be done by the management layer.
> > 
> > Dave
> 
> management already said they can't because files get closed and
> resources freed on unplug and so they might not be able to re-add device
> on migration failure. We do it in migration because that is
> where failures can happen and we can recover.

We are capable of providing an API to libvirt where files won't
get closed when a device is unplugged, if necessary.

This might become necessary if libvirt or management software
developers tell us the interface we are providing is not going to
work for them.

-- 
Eduardo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]