qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC v2 2/8] monitor: allow monitor to create thread to


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] [RFC v2 2/8] monitor: allow monitor to create thread to poll
Date: Tue, 5 Sep 2017 19:58:57 +0100
User-agent: Mutt/1.8.3 (2017-05-23)

* Peter Xu (address@hidden) wrote:
> On Mon, Aug 28, 2017 at 12:11:38PM +0200, Marc-André Lureau wrote:
> > Hi
> > 
> > On Mon, Aug 28, 2017 at 5:05 AM, Peter Xu <address@hidden> wrote:
> > > On Fri, Aug 25, 2017 at 04:07:34PM +0000, Marc-André Lureau wrote:
> > >> On Fri, Aug 25, 2017 at 5:33 PM Dr. David Alan Gilbert <address@hidden>
> > >> wrote:
> > >>
> > >> > * Marc-André Lureau (address@hidden) wrote:
> > >> > > Hi
> > >> > >
> > >> > > On Wed, Aug 23, 2017 at 8:52 AM Peter Xu <address@hidden> wrote:
> > >> > >
> > >> > > > Firstly, introduce Monitor.use_thread, and set it for monitors 
> > >> > > > that are
> > >> > > > using non-mux typed backend chardev.  We only do this for 
> > >> > > > monitors, so
> > >> > > > mux-typed chardevs are not suitable (when it connects to, e.g., 
> > >> > > > serials
> > >> > > > and the monitor together).
> > >> > > >
> > >> > > > When use_thread is set, we create standalone thread to poll the 
> > >> > > > monitor
> > >> > > > events, isolated from the main loop thread.  Here we still need to 
> > >> > > > take
> > >> > > > the BQL before dispatching the tasks since some of the monitor 
> > >> > > > commands
> > >> > > > are not allowed to execute without the protection of BQL.  Then 
> > >> > > > this
> > >> > > > gives us the chance to avoid taking the BQL for some monitor 
> > >> > > > commands
> > >> > in
> > >> > > > the future.
> > >> > > >
> > >> > > > * Why this change?
> > >> > > >
> > >> > > > We need these per-monitor threads to make sure we can have at 
> > >> > > > least one
> > >> > > > monitor that will never stuck (that can receive further monitor
> > >> > > > commands).
> > >> > > >
> > >> > > > * So when will monitors stuck?  And, how do they stuck?
> > >> > > >
> > >> > > > After we have postcopy and remote page faults, it's simple to 
> > >> > > > achieve a
> > >> > > > stuck in the monitor (which is also a stuck in main loop thread):
> > >> > > >
> > >> > > > (1) Monitor deadlock on BQL
> > >> > > >
> > >> > > > As we may know, when postcopy is running on destination VM, the 
> > >> > > > vcpu
> > >> > > > threads can stuck merely any time as long as it tries to access an
> > >> > > > uncopied guest page.  Meanwhile, when the stuck happens, it is 
> > >> > > > possible
> > >> > > > that the vcpu thread is holding the BQL.  If the page fault is not
> > >> > > > handled quickly, you'll find that monitors stop working, which is
> > >> > trying
> > >> > > > to take the BQL.
> > >> > > >
> > >> > > > If the page fault cannot be handled correctly (one case is a paused
> > >> > > > postcopy, when network is temporarily down), monitors will hang
> > >> > > > forever.  Without current patch, that means the main loop hanged.
> > >> > We'll
> > >> > > > never find a way to talk to VM again.
> > >> > > >
> > >> > >
> > >> > > Could the BQL be pushed down to the monitor commands level instead? 
> > >> > > That
> > >> > > way we wouldn't need a seperate thread to solve the hang on commands 
> > >> > > that
> > >> > > do not need BQL.
> > >> >
> > >> > If the main thread is stuck though I don't see how that helps you; you
> > >> > have to be able to run these commands on another thread.
> > >> >
> > >>
> > >> Why would the main thread be stuck? In (1) If the vcpu thread takes the 
> > >> BQL
> > >> and the command doesn't need it, it would work.  In (2),  info cpus
> > >> shouldn't keep the BQL (my qapi-async series would probably help here)
> > >
> > > (Thanks for joining the discussion)
> > >
> > > AFAIK the main thread can be stuck for many reasons.  I have seen one
> > > stack when the VGA code (IIUC) was trying to writting to guest graphic
> > > memory in main loop thread but luckily that guest page is still not
> > > copied yet from source.  As long as the main thread is stuck for any
> > > reason, no chance for monitor commands, even if the commands support
> > > async operations.
> > 
> > If that command becomes async (it probably should, any command doing
> > IO probaly should), then the main loop can keep running.
> 
> The problem is that, it's not blocked at "a command", but a task
> running on the main thread.  The task can access guest memory, and
> when the guest page is not there, the main thread hangs.  Then it
> hangs every monitors, and all other tasks that are bounded to main
> thread.

This is my main reason for believing we need this approach; I don't
see how the async-command solution solves it.

(I don't have anything against the async command stuff, I just don't
think it solves this problem)

Dave

> > 
> > >
> > > So IMHO the only solution is doing these things in separate threads,
> > > rather than all in a single one.
> > 
> > I wouldn't say it's the only solution. I think the monitor can touch
> > many areas that haven't been written with multi-threading in mind. My
> > proposal is probably safer, although I don't know how hard it would be
> > to push the BQL down to QMP commands, and make async existing IO
> > commands. The benefits of this work are quite interesting imho,
> > because a stuck mainloop is basically a stuck qemu, and an additional
> > thread will not solve it...
> > 
> > -- 
> > Marc-André Lureau
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]