[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC v2 2/8] monitor: allow monitor to create thread to
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [Qemu-devel] [RFC v2 2/8] monitor: allow monitor to create thread to poll |
Date: |
Fri, 25 Aug 2017 10:30:42 +0100 |
User-agent: |
Mutt/1.8.3 (2017-05-23) |
* Peter Xu (address@hidden) wrote:
> On Wed, Aug 23, 2017 at 06:35:35PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (address@hidden) wrote:
> > > Firstly, introduce Monitor.use_thread, and set it for monitors that are
> > > using non-mux typed backend chardev. We only do this for monitors, so
> > > mux-typed chardevs are not suitable (when it connects to, e.g., serials
> > > and the monitor together).
> > >
> > > When use_thread is set, we create standalone thread to poll the monitor
> > > events, isolated from the main loop thread. Here we still need to take
> > > the BQL before dispatching the tasks since some of the monitor commands
> > > are not allowed to execute without the protection of BQL. Then this
> > > gives us the chance to avoid taking the BQL for some monitor commands in
> > > the future.
> > >
> > > * Why this change?
> > >
> > > We need these per-monitor threads to make sure we can have at least one
> > > monitor that will never stuck (that can receive further monitor
> > > commands).
> > >
> > > * So when will monitors stuck? And, how do they stuck?
> >
> > (Minor: 'stuck' is past tense, 'stick' is probably the right word; however
> > 'block' is probably what you actually want)
>
> Yet another English error. Thanks! :-)
That's OK - only minor.
> (I guess "monitors get stuck" should also work?)
Yes.
> >
> > > After we have postcopy and remote page faults, it's simple to achieve a
> > > stuck in the monitor (which is also a stuck in main loop thread):
> > >
> > > (1) Monitor deadlock on BQL
> > >
> > > As we may know, when postcopy is running on destination VM, the vcpu
> > > threads can stuck merely any time as long as it tries to access an
> > > uncopied guest page. Meanwhile, when the stuck happens, it is possible
> > > that the vcpu thread is holding the BQL. If the page fault is not
> > > handled quickly, you'll find that monitors stop working, which is trying
> > > to take the BQL.
> > >
> > > If the page fault cannot be handled correctly (one case is a paused
> > > postcopy, when network is temporarily down), monitors will hang
> > > forever. Without current patch, that means the main loop hanged. We'll
> > > never find a way to talk to VM again.
> > >
> > > (2) Monitor tries to run codes page-faulted vcpus
> > >
> > > The HMP command "info cpus" is one of the good example - it tries to
> > > kick all the vcpus and sync status from them. However, if there is any
> > > vcpu that stuck at an unhandled page fault, it can never achieve the
> > > sync, then the HMP hangs. Again, it hangs the main loop thread as well.
> > >
> > > After either (1) or (2), we can see the deadlock problem:
> > >
> > > - On one hand, if monitor hangs, we cannot do the postcopy recovery,
> > > because postcopy recovery needs user to specify new listening port on
> > > destination monitor.
> > >
> > > - On the other hand, if we cannot recover the paused postcopy, then page
> > > faults cannot be serviced, and the monitors will possibly hang
> > > forever then.
> > >
> > > * How this patch helps?
> > >
> > > - Firstly, we'll have our own thread for each dedicated monitor (or say,
> > > the backend chardev is only used for monitor), so even main loop
> > > thread hangs (it is always possible), this monitor thread may still
> > > survive.
> > >
> > > - Not all monitor commands need the BQL. We can selectively take the
> > > BQL (depends on which command we are running) to avoid waiting on a
> > > page-faulted vcpu thread that has taken the BQL (this will be done in
> > > following up patches).
> > >
> > > Signed-off-by: Peter Xu <address@hidden>
> >
> > A few high level things:
> > a) I think this patch probably wants to split into
> > 1) A patch that decides whether to create a new thread and
> > initialises it
> > 2) One that starts to fix up the locking
>
> Sure.
>
> >
> > b) I think you also need to take the bql around any of the custom
> > completion functions; (maybe in monitor_find_completion ?)
> > since they do things like walk the lists of devices.
>
> Ah, yes. Actually IMHO those completions should be protected by
> smaller locks as well. Considering this only affects HMP, how about
> this: when "without-bql" is set for a command, it should mean that the
> whole command does not need BQL, this should include not only the
> command execution part, but also the command auto completion routine.
> So I take the BQL in the completion only for those whose "without-bql"
> is unset, like the trick played for the command execution part.
>
> For the only command "migrate_incoming", it does not have completion
> routine, so "without-bql=true" still applies.
>
> Or would you prefer I just take the lock unconditionally?
I think either of those would work; no preference.
> >
> > c) As mentioned on irc there's fun to be had with cur_mon and error
> > handling - in my local world I have cur_mon declared as __thread
> > but never got around to thinking aobut what should set it up.
> > There's also 'wavcapture: Convert to error_report' that I posted
> > in March that got rid of some uses of cur_mon in wavcapture.c
> > for error_report.
>
> Yeh. I at least also see a positive ACK from Markus in the other
> thread for per-thread cur_mon, sounds like this is the right way to
> go.
>
> To setup cur_mon, what I can think of is create wrapper for
> pthread_create() in qemu_thread_create(). I see that we have done
> similar thing in util/qemu-thread-win32.c for Windows. With that we
> can setup the cur_mon before going into real thread function but in
> the right context, though we may need one more parameter for current
> qemu_thread_create():
>
> void qemu_thread_create(QemuThread *thread, const char *name,
> void *(*start_routine)(void*),
> void *arg, int mode, Monitor *mon);
>
> Then we can specify monitor for any new thread (default to cur_mon).
> For per-monitor threads, I think we need to pass in that specific mon.
>
> Is this doable?
That would mean changing all the qemu_thread_create calls, but yes
I guess is doable. I'd thought the other way, perhaps you inherit
Monitor except in the case of when the monitor creates threads.
> > But there's some interesting stuff to be checked
> > with where error_reporting goes.
>
> Do you mean the case when e.g. we only have one HMP and that HMP is
> threaded? If so, I guess the error_report()s will be directed mostly
> to stderr.
>
> I believe it'll break some HMP users, but IIUC HMP behavior is allowed
> to be changed and after all people can still catch the error message
> somewhere, though outside that HMP console. So I think it might be ok.
I think if we get cur_mon right then it'll work OK.
> >
> > d) I wonder if it's better to have thread as a flag, so that you have
> > to explicitly ask for a monitor to have it's own thread.
>
> This should be doable. Would a new parameter for "-qmp" and "-hmp"
> suffice?
Yes.
Dave
> --
> Peter Xu
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK
- [Qemu-devel] [RFC v2 0/8] monitor: allow per-monitor thread, Peter Xu, 2017/08/23
- Re: [Qemu-devel] [RFC v2 2/8] monitor: allow monitor to create thread to poll, Marc-André Lureau, 2017/08/25
- Re: [Qemu-devel] [RFC v2 2/8] monitor: allow monitor to create thread to poll, Dr. David Alan Gilbert, 2017/08/25
- Re: [Qemu-devel] [RFC v2 2/8] monitor: allow monitor to create thread to poll, Marc-André Lureau, 2017/08/25
- Re: [Qemu-devel] [RFC v2 2/8] monitor: allow monitor to create thread to poll, Dr. David Alan Gilbert, 2017/08/25
- Re: [Qemu-devel] [RFC v2 2/8] monitor: allow monitor to create thread to poll, Marc-André Lureau, 2017/08/25
- Re: [Qemu-devel] [RFC v2 2/8] monitor: allow monitor to create thread to poll, Dr. David Alan Gilbert, 2017/08/25
- Re: [Qemu-devel] [RFC v2 2/8] monitor: allow monitor to create thread to poll, Marc-André Lureau, 2017/08/26
- Re: [Qemu-devel] [RFC v2 2/8] monitor: allow monitor to create thread to poll, Peter Xu, 2017/08/27
- Re: [Qemu-devel] [RFC v2 2/8] monitor: allow monitor to create thread to poll, Marc-André Lureau, 2017/08/28