bug#36609: 27.0.50; Possible race-condition in threading implementation

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#36609: 27.0.50; Possible race-condition in threading implementation

From:	Eli Zaretskii
Subject:	bug#36609: 27.0.50; Possible race-condition in threading implementation
Date:	Sat, 13 Jul 2019 09:50:02 +0300

> From: Pip Cet <pipcet@gmail.com>
> Date: Fri, 12 Jul 2019 19:30:34 +0000
> Cc: politza@hochschule-trier.de, 36609@debbugs.gnu.org
> 
> > > > We should either release the global lock before the thread exits, or
> > > > defer the acting upon the signal until later.  We cannot disable the
> > > > signal handling altogether because it is entirely legitimate to signal
> > > > another thread, and when we do, that other thread will _always_ be
> > > > inside thread_select.
> > >
> > > Really? What about thread-yield?
> >
> > What about it?
> >
> > You are asking whether, when thread-signal is executed, the thread
> > which we are signaling is necessarily parked inside thread_select?  If
> > so, I don't understand your surprise: only one thread can ever be
> > running, and that is by definition the thread which calls
> > thread-signal.  All the other threads cannot be running, which means
> > they are parked either in thread_select or in sys_mutex_lock called
> > from acquire_global_lock.  Right?
> 
> No, they might also be in the sys_thread_yield syscall, having
> released the global lock but not yet reacquired it:
> 
>   release_global_lock ();
>   sys_thread_yield (); <<<<< here
>   acquire_global_lock (self);

OK, but that, too, means the thread being signaled is not running,
right?  And I still think that a very frequent case, perhaps the most
frequent, is that the thread being signaled is inside thread_select.

> I'm not sure how it's relevant to assert that "that other thread will
> _always_ be inside thread_select".

OK, we've now established that the other thread could also be in
acquire_global_lock or (for a very short time) in sys_thread_yield.

> I have an idea where you might be going with that

I was merely pointing out that we cannot disable the signal handling
as a means to solve the problem.

> but that idea wouldn't work (to release the lock from the signalling
> thread, not the signalled thread that holds it).

Maybe we have a misunderstanding here.  I was talking about this part
of post_acquire_global_lock:

   /* We could have been signaled while waiting to grab the global lock
      for the first time since this thread was created, in which case
      we didn't yet have the opportunity to set up the handlers.  Delay
      raising the signal in that case (it will be actually raised when
      the thread comes here after acquiring the lock the next time).  */
  if (!NILP (current_thread->error_symbol) && handlerlist)
    {
      Lisp_Object sym = current_thread->error_symbol;
      Lisp_Object data = current_thread->error_data;

      current_thread->error_symbol = Qnil;
      current_thread->error_data = Qnil;
      Fsignal (sym, data);
    }

In this part, we have already switched to the thread that has been
signaled, so we are in the signaled thread, not in the signaling
thread.  I meant to release the lock before the call to Fsignal here.

> > If the problem with missing events,
> > then which events are those, and what bad things will happen if we
> > miss them?
> 
> All events that glib knows about but Emacs doesn't. For example, a
> glib timeout is apparently used to achieve some kind of scroll effect
> on GTK menus, which is why we call xg_select from xmenu.c.
> 
> I don't know which libraries use glib-based threads, but I think dbus does, 
> too.
> 
> I believe, but am not certain, that this also includes X events when
> using GTK. That would explain the frozen sessions.

So is the problem that the Glib context is locked "forever", or is the
problem that it's locked by another Lisp thread, even if this lock is
short-lived?  If the former, then arranging for the release of that
lock when the signaled thread exits would solve the problem, right?
And if the problem is the latter one, then why didn't we hear about
this much earlier?  Can you show the bad effect from missing these
events without signaling a thread?

Thanks.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#36609: 27.0.50; Possible race-condition in threading implementation, (continued)

Prev by Date: bug#36613: gdb: error on M-x menu-bar-open
Next by Date: bug#18241: 24.4.50; [PATCH] I can now highlight-lines-matching-regexp from isearch
Previous by thread: bug#36609: 27.0.50; Possible race-condition in threading implementation
Next by thread: bug#36610: [PATCH] Make finder-exit use quit-window
Index(es):
- Date
- Thread