qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC v2 0/8] monitor: allow per-monitor thread


From: Daniel P. Berrange
Subject: Re: [Qemu-devel] [RFC v2 0/8] monitor: allow per-monitor thread
Date: Wed, 6 Sep 2017 12:06:29 +0100
User-agent: Mutt/1.8.3 (2017-05-23)

On Wed, Sep 06, 2017 at 11:57:05AM +0100, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrange (address@hidden) wrote:
> > On Wed, Sep 06, 2017 at 11:48:51AM +0100, Dr. David Alan Gilbert wrote:
> > > * Daniel P. Berrange (address@hidden) wrote:
> > > > On Wed, Sep 06, 2017 at 10:48:46AM +0100, Dr. David Alan Gilbert wrote:
> > > > > * Daniel P. Berrange (address@hidden) wrote:
> > > > > > On Wed, Aug 23, 2017 at 02:51:03PM +0800, Peter Xu wrote:
> > > > > > > v2:
> > > > > > > - fixed "make check" error that patchew reported
> > > > > > > - moved the thread_join upper in monitor_data_destroy(), before
> > > > > > >   resources are released
> > > > > > > - added one new patch (current patch 3) that fixes a nasty risk
> > > > > > >   condition with IOWatchPoll.  Please see commit message for more
> > > > > > >   information.
> > > > > > > - added a g_main_context_wakeup() to make sure the separate loop
> > > > > > >   thread can be kicked always when we want to destroy the 
> > > > > > > per-monitor
> > > > > > >   threads.
> > > > > > > - added one new patch (current patch 8) to introduce migration 
> > > > > > > mgmt
> > > > > > >   lock for migrate_incoming.
> > > > > > > 
> > > > > > > This is an extended work for migration postcopy recovery. This 
> > > > > > > series
> > > > > > > is tested with the following series to make sure it solves the 
> > > > > > > monitor
> > > > > > > hang problem that we have encountered for postcopy recovery:
> > > > > > > 
> > > > > > >   [RFC 00/29] Migration: postcopy failure recovery
> > > > > > >   [RFC 0/6] migration: re-use migrate_incoming for postcopy 
> > > > > > > recovery
> > > > > > > 
> > > > > > > The root problem is that, monitor commands are all handled in main
> > > > > > > loop thread now, no matter how many monitors we specify. And, if 
> > > > > > > main
> > > > > > > loop thread hangs due to some reason, all monitors will be stuck.
> > > > > > > This can be done in reversed order as well: if any of the monitor
> > > > > > > hangs, it will hang the main loop, and the rest of the monitors 
> > > > > > > (if
> > > > > > > there is any).
> > > > > > > 
> > > > > > > That affects postcopy recovery, since the recovery requires user 
> > > > > > > input
> > > > > > > on destination side.  If monitors hang, the destination VM dies 
> > > > > > > and
> > > > > > > lose hope for even a final recovery.
> > > > > > > 
> > > > > > > So, sometimes we need to make sure the monitor be alive, at least 
> > > > > > > one
> > > > > > > of them.
> > > > > > > 
> > > > > > > The whole idea of this series is that instead if handling monitor
> > > > > > > commands all in main loop thread, we do it separately in 
> > > > > > > per-monitor
> > > > > > > threads.  Then, even if main loop thread hangs at any point by any
> > > > > > > reason, per-monitor thread can still survive.  Further, we add 
> > > > > > > hint in
> > > > > > > QMP/HMP to show whether a command can be executed without QMP, if 
> > > > > > > so,
> > > > > > > we avoid taking BQL when running that command.  It greatly reduced
> > > > > > > contention of BQL.  Now the only user of that new parameter 
> > > > > > > (currently
> > > > > > > I call it "without-bql") is "migrate-incoming" command, which is 
> > > > > > > the
> > > > > > > only command to rescue a paused postcopy migration.
> > > > > > > 
> > > > > > > However, even with the series, it does not mean that per-monitor
> > > > > > > threads will never hang.  One example is that we can still run 
> > > > > > > "info
> > > > > > > vcpus" in per-monitor threads during a paused postcopy (in that 
> > > > > > > state,
> > > > > > > page faults are never handled, and "info cpus" will never return 
> > > > > > > since
> > > > > > > it tries to sync every vcpus).  So to make sure it does not hang, 
> > > > > > > we
> > > > > > > not only need the per-monitor thread, the user should be careful 
> > > > > > > as
> > > > > > > well on how to use it.
> > > > > > > 
> > > > > > > For postcopy recovery, we may need dedicated monitor channel for
> > > > > > > recovery.  In other words, a destination VM that supports postcopy
> > > > > > > recovery would possibly need:
> > > > > > > 
> > > > > > >   -qmp MAIN_CHANNEL -qmp RECOVERY_CHANNEL
> > > > > > 
> > > > > > I think this is a really horrible thing to expose to management 
> > > > > > applications.
> > > > > > They should not need to be aware of fact that QEMU is buggy and 
> > > > > > thus requires
> > > > > > that certain commands be run on different monitors to work around 
> > > > > > the bug.
> > > > > 
> > > > > It's unfortunately baked in way too deep to fix in the near term; the
> > > > > BQL is just too cantagious and we have a fundamental design of running
> > > > > all the main IO emulation in one thread.
> > > > > 
> > > > > > I'd much prefer to see the problem described handled transparently 
> > > > > > inside
> > > > > > QEMU. One approach is have a dedicated thread in QEMU responsible 
> > > > > > for all
> > > > > > monitor I/O. This thread should never actually execute monitor 
> > > > > > commands
> > > > > > though, it would simply parse the command request and put data onto 
> > > > > > a queue
> > > > > > of pending commands, thus it could never hang. The command queue 
> > > > > > could be
> > > > > > processed by the main thread, or by another thread that is 
> > > > > > interested.
> > > > > > eg the migration thread could process any queued commands related to
> > > > > > migration directly.
> > > > > 
> > > > > That requires a change in the current API to allow async command
> > > > > completion (OK that is something Marc-Andre's world has) so that
> > > > > from the one connection you can have multiple outstanding commands.
> > > > > Hmm unless....
> > > > > 
> > > > > We've also got problems that some commands don't like being run 
> > > > > outside
> > > > > of the main thread (see Fam's reply on the 21st pointing out that a 
> > > > > lot
> > > > > of block commands would assert).
> > > > > 
> > > > > I think the way to move to what you describe would be:
> > > > >   a) A separate thread for monitor IO
> > > > >       This seems a separate problem
> > > > >       How hard is that?  Will all the current IO mechanisms used
> > > > >       for monitors just work if we run them in a separate thread?
> > > > >       What about mux?
> > > > > 
> > > > >   b) Initially all commands get dispatched to the main thread
> > > > >      so nothing changes about the API.
> > > > > 
> > > > >   c) We create a new thread for the lock-free commands, and route
> > > > >       lock-free commands down it.
> > > > > 
> > > > >   d) We start with a rule that on any one monitor connection we
> > > > >   don't allow you to start a command until the previous one has
> > > > >   finished
> > > > > 
> > > > > (d) allows us to avoid any API changes, but allows us to do lock-free
> > > > > stuff on a separate connection like Peter's world.
> > > > > We can drop (d) once we have a way of doing async commands.
> > > > > We can add dispatching to more threads once someone describes
> > > > > what they want from those threads.
> > > > > 
> > > > > Does that work for you Dan?
> > > > 
> > > > It would *provided* that we do (c) for the commands Peter wants for
> > > > this migration series.  IOW, I don't want to have to have logic in
> > > > libvirt that either needs to add a 2nd monitor server, or open a 2nd
> > > > monitor connection, to deal with migration post-copy recovery in some
> > > > versions of QEMU.  So whatever is needed to make post-copy recovery
> > > > work has to be done for (c).
> > > 
> > > But then doesn't that mean you're requiring us to break (d) and change
> > > the QMP interface to libvirt so it can do async stuff?
> > 
> > Depends on your definition of break - I'm assuming there's either a way
> > to opt-in to use of a async mode for existing commands in (c), or that
> > async commands would be added in parallel with existing sync commands.
> > IOW, its not a API breakage - its an opt-in extension of existing
> > functionality.
> 
> But you'd need to do async commands for all commands you issued to avoid
> blocking the io thread so that you could then issue the recovery
> commands.

I don't see why that has to be the case. In order to issue an async command
all that needs to be the case is that command replies should be allowed to
be sent out of order.

IOW if command A is blocking and command B is async, then we shoudl be
allowed to have the following

   req A
   req B
   res A
   res B

Or

   req A
   req B
   res B
   res A

Or

   req B
   req A
   res B
   res A

etc.

This does imply that you need a separate monitor I/O processing, from the
command execution thread, but I see no need for all commands to suddenly
become async. Just allowing interleaved replies is sufficient from the
POV of the protocol definition. This interleaving is easy to handle from
the client POV - just requires a unique 'serial' in the request by the
client, that is copied into the reply by QEMU.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



reply via email to

[Prev in Thread] Current Thread [Next in Thread]