qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC v2 0/8] monitor: allow per-monitor thread


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] [RFC v2 0/8] monitor: allow per-monitor thread
Date: Thu, 7 Sep 2017 12:31:04 +0100
User-agent: Mutt/1.8.3 (2017-05-23)

* Markus Armbruster (address@hidden) wrote:
> "Dr. David Alan Gilbert" <address@hidden> writes:
> 
> > * Daniel P. Berrange (address@hidden) wrote:
> >> On Thu, Sep 07, 2017 at 04:13:41PM +0800, Peter Xu wrote:
> >> > On Wed, Sep 06, 2017 at 12:54:28PM +0100, Daniel P. Berrange wrote:
> >> > > On Wed, Sep 06, 2017 at 12:31:58PM +0100, Dr. David Alan Gilbert wrote:
> >> > > > * Daniel P. Berrange (address@hidden) wrote:
> >> > > > > This does imply that you need a separate monitor I/O processing, 
> >> > > > > from the
> >> > > > > command execution thread, but I see no need for all commands to 
> >> > > > > suddenly
> >> > > > > become async. Just allowing interleaved replies is sufficient from 
> >> > > > > the
> >> > > > > POV of the protocol definition. This interleaving is easy to 
> >> > > > > handle from
> >> > > > > the client POV - just requires a unique 'serial' in the request by 
> >> > > > > the
> >> > > > > client, that is copied into the reply by QEMU.
> >> > > > 
> >> > > > OK, so for that we can just take Marc-André's syntax and call it 
> >> > > > 'id':
> >> > > >   https://lists.gnu.org/archive/html/qemu-devel/2017-01/msg03634.html
> >> > > > 
> >> > > > then it's upto the caller to ensure those id's are unique.
> >> > > 
> >> > > Libvirt has in fact generated a unique 'id' for every monitor command
> >> > > since day 1 of supporting QMP.
> >> > > 
> >> > > > I do worry about two things:
> >> > > >   a) With this the caller doesn't really know which commands could be
> >> > > >   in parallel - for example if we've got a recovery command that's
> >> > > >   executed by this non-locking thread that's OK, we expect that
> >> > > >   to be doable in parallel.  If in the future though we do
> >> > > >   what you initially suggested and have a bunch of commands get
> >> > > >   routed to the migration thread (say) then those would suddenly
> >> > > >   operate in parallel with other commands that we're previously
> >> > > >   synchronous.
> >> > > 
> >> > > We could still have an opt-in for async commands. eg default to 
> >> > > executing
> >> > > all commands in the main thread, unless the client issues an explicit
> >> > > "make it async" command, to switch to allowing the migration thread to
> >> > > process it async.
> >> > > 
> >> > >  { "execute": "qmp_allow_async",
> >> > >    "data": { "commands": [
> >> > >        "migrate_cancel",
> >> > >    ] } }
> >> > > 
> >> > > 
> >> > >  { "return": { "commands": [
> >> > >        "migrate_cancel",
> >> > >    ] } }
> >> > > 
> >> > > The server response contains the subset of commands from the request
> >> > > for which async is supported.
> >> > > 
> >> > > That gives good negotiation ability going forward as we incrementally
> >> > > support async on more commands.
> >> > 
> >> > I think this goes back to the discussion on which design we'd like to
> >> > choose.  IMHO the whole async idea plus the per-command-id is indeed
> >> > cleaner and nicer, and I believe that can benefit not only libvirt,
> >> > but also other QMP users.  The problem is, I have no idea how long
> >> > it'll take to let us have such a feature - I believe that will include
> >> > QEMU and Libvirt to both support that.  And it'll be a pity if the
> >> > postcopy recovery cannot work only because we cannot guarantee a
> >> > stable monitor.
> >> 
> >> This is not a blocker for having postcopy recovery feature merged.
> >> It merely means that in a situation where the mainloop is blocked,
> >> then we can't recover, in other situations we'll be able to recover
> >> fine. Sure it would be nice to fix that problem too, but I don't
> >> see it as a block.
> >
> > It's probably OK to merge the recovery code before the monitor code;
> > but I don't think it's something you'd want to tell users about -
> > a 'postcopy recovery that only works rarely' isn't much use.
> 
> "Rarely"?  Are main loop hangs *that* common?
> 
> Can we quantify the problem to help gauge urgency?

Not really; it depends on workload and behaviour.  The people who worry
about postcopy recovery actually care about it working in almost every
case.
So I'm OK to add the recovery code, it's just not something we should
be shouting about to users until we have the monitor fixed.

Dave

--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]