qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
Date: Fri, 31 May 2019 09:23:44 +0100
User-agent: Mutt/1.11.4 (2019-03-13)

* Michael S. Tsirkin (address@hidden) wrote:
> On Thu, May 30, 2019 at 08:08:23PM +0100, Dr. David Alan Gilbert wrote:
> > * Michael S. Tsirkin (address@hidden) wrote:
> > > On Thu, May 30, 2019 at 07:00:23PM +0100, Dr. David Alan Gilbert wrote:
> > > > * Michael S. Tsirkin (address@hidden) wrote:
> > > > > On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:
> > > > > > Hi David,
> > > > > > 
> > > > > > sorry for the  delayed reply.
> > > > > > 
> > > > > > On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
> > > > > > > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert 
> > > > > > > wrote:
> > > > > > > > * Jens Freimann (address@hidden) wrote:
> > > > > > > > > +static void virtio_net_primary_plug_timer(void *opaque);
> > > > > > > > > +
> > > > > > > > >  static void virtio_net_set_link_status(NetClientState *nc)
> > > > > > > > >  {
> > > > > > > > >      VirtIONet *n = qemu_get_nic_opaque(nc);
> > > > > > > > > @@ -786,6 +796,14 @@ static void 
> > > > > > > > > virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
> > > > > > > > >      } else {
> > > > > > > > >          memset(n->vlans, 0xff, MAX_VLAN >> 3);
> > > > > > > > >      }
> > > > > > > > > +
> > > > > > > > > +    if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) {
> > > > > > > > > +        atomic_set(&n->primary_should_be_hidden, false);
> > > > > > > > > +        if (n->primary_device_timer)
> > > > > > > > > +            timer_mod(n->primary_device_timer,
> > > > > > > > > +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > > > > > > > > +                4000);
> > > > > > > > > +    }
> > > > > > > > 
> > > > > > > > What's this magic timer constant and why?
> > > > > > 
> > > > > > To be honest it's a leftover from previous versions (before I took
> > > > > > over) of the patches and I'm not sure why the timer is there.
> > > > > > I removed it and so far see no reason to keep it.
> > > > > > 
> > > > > > > > 
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > >  static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t 
> > > > > > > > > cmd,
> > > > > > > > > @@ -2626,6 +2644,87 @@ void 
> > > > > > > > > virtio_net_set_netclient_name(VirtIONet *n, const char *name,
> > > > > > > > >      n->netclient_type = g_strdup(type);
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > +static void virtio_net_primary_plug_timer(void *opaque)
> > > > > > > > > +{
> > > > > > > > > +    VirtIONet *n = opaque;
> > > > > > > > > +    Error *err = NULL;
> > > > > > > > > +
> > > > > > > > > +    if (n->primary_device_dict)
> > > > > > > > > +        n->primary_device_opts = 
> > > > > > > > > qemu_opts_from_qdict(qemu_find_opts("device"),
> > > > > > > > > +            n->primary_device_dict, &err);
> > > > > > > > > +    if (n->primary_device_opts) {
> > > > > > > > > +        n->primary_dev = 
> > > > > > > > > qdev_device_add(n->primary_device_opts, &err);
> > > > > > > > > +        error_setg(&err, "virtio_net: couldn't plug in 
> > > > > > > > > primary device");
> > > > > > > > > +        return;
> > > > > > > > > +    }
> > > > > > > > > +    if (!n->primary_device_dict && err) {
> > > > > > > > > +        if (n->primary_device_timer) {
> > > > > > > > > +            timer_mod(n->primary_device_timer,
> > > > > > > > > +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > > > > > > > > +                100);
> > > > > > > > 
> > > > > > > > same here.
> > > > > > 
> > > > > > see above
> > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > +        }
> > > > > > > > > +    }
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static void virtio_net_handle_migration_primary(VirtIONet *n,
> > > > > > > > > +                                                
> > > > > > > > > MigrationState *s)
> > > > > > > > > +{
> > > > > > > > > +    Error *err = NULL;
> > > > > > > > > +    bool should_be_hidden = 
> > > > > > > > > atomic_read(&n->primary_should_be_hidden);
> > > > > > > > > +
> > > > > > > > > +    n->primary_dev = 
> > > > > > > > > qdev_find_recursive(sysbus_get_default(),
> > > > > > > > > +            n->primary_device_id);
> > > > > > > > > +    if (!n->primary_dev) {
> > > > > > > > > +        error_setg(&err, "virtio_net: couldn't find primary 
> > > > > > > > > device");
> > > > > > > > 
> > > > > > > > There's something broken with the error handling in this 
> > > > > > > > function - the
> > > > > > > > 'err' never goes anywhere - I don't think it ever gets printed 
> > > > > > > > or
> > > > > > > > reported or stops the migration.
> > > > > > 
> > > > > > yes, I'll fix it.
> > > > > > 
> > > > > > > > > +    }
> > > > > > > > > +    if (migration_in_setup(s) && !should_be_hidden && 
> > > > > > > > > n->primary_dev) {
> > > > > > > > > +        qdev_unplug(n->primary_dev, &err);
> > > > > > > > 
> > > > > > > > Not knowing unplug well; can you just explain - is that device 
> > > > > > > > hard
> > > > > > > > unplugged and it's gone by the time this function returns or is 
> > > > > > > > it still
> > > > > > > > hanging around for some indeterminate time?
> > > > > > 
> > > > > > Qemu will trigger an unplug request via pcie attention button in 
> > > > > > which case
> > > > > > there could be a delay by the guest operating system. We could give 
> > > > > > it some
> > > > > > amount of time and if nothing happens try surpise removal or handle 
> > > > > > the
> > > > > > error otherwise.
> > > > > > 
> > > > > > 
> > > > > > regards,
> > > > > > Jens
> > > > > 
> > > > > That's a subject for another day. Let's get the basic thing
> > > > > working.
> > > > 
> > > > Well no, we need to know this thing isn't going to hang in the migration
> > > > setup phase, or if it does how we recover.
> > > 
> > > 
> > > This thing is *supposed* to be stuck in migration startup phase
> > > if guest is malicious.
> > > 
> > > If migration does not progress management needs
> > > a way to detect this and cancel.
> > > 
> > > Some more documentation about how this is supposed to happen
> > > would be helpful.
> > 
> > I want to see that first; because I want to convinced it's just a
> > documentation problem and that we actually really have a method of
> > recovering.
> > 
> > > >  This patch series is very
> > > > odd precisely because it's trying to do the unplug itself in the
> > > > migration phase rather than let the management layer do it - so unless
> > > > it's nailed down how to make sure that's really really bullet proof
> > > > then we've got to go back and ask the question about whether we should
> > > > really fix it so it can be done by the management layer.
> > > > 
> > > > Dave
> > > 
> > > management already said they can't because files get closed and
> > > resources freed on unplug and so they might not be able to re-add device
> > > on migration failure. We do it in migration because that is
> > > where failures can happen and we can recover.
> > 
> > I find this explanation confusing - I can kind of see where it's coming
> > from, but we've got a pretty clear separation between a NIC and the
> > netdev that backs it; those files and resources should be associated
> > with the netdev and not the NIC.  So does hot-removing the NIC really
> > clean up the netdev?  (I guess maybe this is a different in vfio
> > which is the problem)
> > 
> > Dave
> 
> what we are removing is the VFIO device.
> Nothing to do with nic or netdev.

OK, but at the same time why can't we hold open the VFIOs devices
resources in a comparable way - i.e. don't really let qemu let go of
it even when the guest has unplugged it?

Dave

> > > > > -- 
> > > > > MST
> > > > --
> > > > Dr. David Alan Gilbert / address@hidden / Manchester, UK
> > --
> > Dr. David Alan Gilbert / address@hidden / Manchester, UK
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]