qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/5] Introduce 'yank' oob qmp command to recover from hanging


From: Daniel P . Berrangé
Subject: Re: [PATCH 0/5] Introduce 'yank' oob qmp command to recover from hanging qemu
Date: Wed, 13 May 2020 12:26:30 +0100
User-agent: Mutt/1.13.4 (2020-02-15)

On Wed, May 13, 2020 at 01:13:20PM +0200, Kevin Wolf wrote:
> Am 13.05.2020 um 12:53 hat Dr. David Alan Gilbert geschrieben:
> > * Kevin Wolf (address@hidden) wrote:
> > > Am 12.05.2020 um 11:43 hat Daniel P. Berrangé geschrieben:
> > > > On Tue, May 12, 2020 at 11:32:06AM +0200, Lukas Straub wrote:
> > > > > On Mon, 11 May 2020 16:46:45 +0100
> > > > > "Dr. David Alan Gilbert" <address@hidden> wrote:
> > > > > 
> > > > > > * Daniel P. Berrangé (address@hidden) wrote: 
> > > > > > > ...
> > > > > > > That way if QEMU does get stuck, you can start by tearing down the
> > > > > > > least distruptive channel. eg try tearing down the migration 
> > > > > > > connection
> > > > > > > first (which shouldn't negatively impact the guest), and only if 
> > > > > > > that
> > > > > > > doesn't work then, move on to tear down the NBD connection (which 
> > > > > > > risks
> > > > > > > data loss)  
> > > > > > 
> > > > > > I wonder if a different way would be to make all network connections
> > > > > > register with yank, but then make yank take a list of connections to
> > > > > > shutdown(2).
> > > > > 
> > > > > Good Idea. We could name the connections (/yank callbacks) in the
> > > > > form "nbd:<node-name>", "chardev:<chardev-name>" and "migration"
> > > > > (and add "netdev:...", etc. in the future). Then make yank take a
> > > > > list of connection names as you suggest and silently ignore 
> > > > > connections
> > > > > that don't exist. And maybe even add a 'query-yank' oob command 
> > > > > returning
> > > > > a list of registered connections so the management application can do
> > > > > pattern matching if it wants.
> > > 
> > > I'm generally not a big fan of silently ignoring things. Is there a
> > > specific requirement to do it in this case, or can management
> > > applications be expected to know which connections exist?
> > > 
> > > > Yes, that would make the yank command much more flexible in how it can
> > > > be used.
> > > > 
> > > > As an alternative to using formatted strings like this, it could be
> > > > modelled more explicitly in QAPI
> > > > 
> > > >   { 'struct':  'YankChannels',
> > > >     'data': { 'chardev': [ 'string' ],
> > > >               'nbd': ['string'],
> > > >               'migration': bool } }
> > > > 
> > > > In this example, 'chardev' would accept a list of chardev IDs which
> > > > have it enabled, 'nbd' would accept a list of block node IDs which
> > > > have it enabled, and migration is a singleton on/off.
> > > 
> > > Of course, it also means that the yank code needs to know about every
> > > single object that supports the operation, whereas if you only have
> > > strings, the objects could keep registering their connection with a
> > > generic function like yank_register_function() in this version.
> > > 
> > > I'm not sure if the additional complexity is worth the benefits.
> > 
> > I tend to agree; although we do have to ensure we either use an existing
> > naming scheme (e.g. QOM object names?) or make sure we've got a well
> > defined list of prefixes.
> 
> Not everything that has a network connection is a QOM object (in fact,
> neither migration nor chardev nor nbd are QOM objects).
> 
> I guess it would be nice to have a single namespace for everything in
> QEMU, but the reality is that we have a few separate ones. As long as we
> consistently add a prefix that identifies the namespace in question, I
> think that would work.
> 
> This means that if we're using node-name to identify the NBD connection,
> the namespace should be 'block' rather than 'nbd'.
> 
> One more thing to consider is, what if a single object has multiple
> connections? In the case of node-names, we have a limited set of allowed
> characters, so we can use one of the remaining characters as a separator
> and then suffix a counter. In other places, the identifier isn't
> restricted, so suffixing doesn't work. Maybe prefixing does, but it
> would have to be there from the beginning then.
> 
> And another thing: Do we really want to document this as limited to
> network connections? Another common cause of hangs is when you have
> image files on an NFS mount and the connection goes away. Of course, in
> the end this is still networking, but inside of QEMU it looks like
> accessing any other file. I'm not sure that we'll allow yanking access
> to image files anytime soon, but it might not hurt to keep it at the
> back of our mind as a potential option we might want the design to
> allow.

Are you referring to the in-kernel NFS client hangs here ?  AFAIK, it is
impossible to do anything to get out of those hangs from userspace, because
the thread is stuck in an uninterruptable sleep in kernel space.

If using the in-QEMU NFS client, then there is a network connection that
can be yanked just like the NBD client.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




reply via email to

[Prev in Thread] Current Thread [Next in Thread]