qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: A bug of Monitor Chardev ?


From: Daniel P . Berrangé
Subject: Re: A bug of Monitor Chardev ?
Date: Fri, 21 May 2021 18:07:56 +0100
User-agent: Mutt/2.0.7 (2021-05-04)

On Fri, May 21, 2021 at 08:59:17PM +0400, Marc-André Lureau wrote:
> Hi
> 
> On Fri, May 21, 2021 at 8:56 PM Daniel P. Berrangé <berrange@redhat.com>
> wrote:
> 
> > On Fri, May 21, 2021 at 05:33:46PM +0100, Daniel P. Berrangé wrote:
> > > On Fri, May 21, 2021 at 10:43:36AM -0400, Peter Xu wrote:
> > > >
> > > > I think the original problem was that if qemu_chr_fe_set_handlers() is
> > called
> > > > in main thread, it can start to race somehow within execution of the
> > function
> > > > qemu_chr_fe_set_handlers() right after we switch context at:
> > > >
> > > >     qemu_chr_be_update_read_handlers(s, context);
> > > >
> > > > Then the rest code in qemu_chr_fe_set_handlers() will continue to run
> > in main
> > > > thread for sure, but the should be running with the new iothread
> > context, which
> > > > introduce a race condition.
> > > >
> > > > Running qemu_chr_be_update_read_handlers() in BH resolves that because
> > then all
> > > > things run in the monitor iothread only and natually serialized.
> > >
> > > The first message in this thread, however, claims that it is *not*
> > > in fact serialized, when using the BH.
> > >
> > > > So the new comment looks indeed not fully right, as the chr device
> > should be
> > > > indeed within main thread context before qemu_chr_fe_set_handlers(),
> > it's just
> > > > that the race may start right away if without BH when context switch
> > happens
> > > > for the chr.
> > >
> > > It sounds like both the comment and the code are potentially wrong.
> >
> >
> > I feel like our root cause problem that the original code was trying to
> > workaround, is that the chardev is "active" from the very moment it is
> > created, regardless of whether the frontend is ready to use it.
> >
> > IIUC, in this case the socket chardev is already listen()ing and
> > accept()ing incoming clients off the network, before the monitor
> > has finished configuring its hooks into the chardev. This means
> > that the initial listen()/accept() I/O watches are using the
> > default GMainContext, and the monitor *has* to remove them and
> > put in new watches on the thread private GMainContext.
> >
> > To eliminate any risk of races, we need to make it possible for the
> > monitor to configure the GMainContext on the chardevs *before* any
> > I/O watches are configured.
> >
> > This in turn suggests that we need to split the chardev initialization
> > into two phases. First we have the basic chardev creation, with object
> > creation, option parsing/sanity checking, socket creation, and then
> > second we have the actual activation where the I/O watches are added.
> >
> > IOW,  qemu_chr_new() is the former and gets run from generic code in
> > the main() method, or in QMP chardev_add.  A new 'qemu_chr_activate'
> > method would be called by whatever frontend is using the chardev,
> > after registering a custom GMainContext.
> >
> > This would involve updating every single existing user of chardevs
> > to add a call to qemu_chr_activate, but that's worth it to eliminate
> > the race by design, rather than workaround it.
> >
> 
> 
> What about my earlier suggestion to add a new
> "qemu_chr_be_disable_handlers()" (until update_read_handlers is called
> again to enable them and the set a different context)?

It could probably work, but it still feels like a bit of a hack to me,
because there's still a window between the chardev create and the
qemu_chr_be_disable_handlers call where new clients arrive.

This may not be a problem in the scenario we're in with the monitor
here, because the mainloop isn't running yet IIUC, but for long term
think we're better off fixing the general problem rather than
introducing more special workarounds.

For example, I think your suggestion will still be racey if we ever
add support for hotplugging additional monitor instances, which
has been talked around recently.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




reply via email to

[Prev in Thread] Current Thread [Next in Thread]