bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#36609: 27.0.50; Possible race-condition in threading implementation


From: Pip Cet
Subject: bug#36609: 27.0.50; Possible race-condition in threading implementation
Date: Fri, 12 Jul 2019 12:57:51 +0000

On Fri, Jul 12, 2019 at 12:42 PM Eli Zaretskii <address@hidden> wrote:
>
> > From: Pip Cet <address@hidden>
> > Date: Fri, 12 Jul 2019 09:02:22 +0000
> > Cc: address@hidden
> >
> > On Thu, Jul 11, 2019 at 8:52 PM Andreas Politz
> > <address@hidden> wrote:
> > > I think there is a race-condition in the implementation of threads.  I
> > > tried to find a minimal test-case, without success.  Thus, I've attached
> > > a lengthy source-file.  Loading that file should trigger this bug and
> > > may freeze your session.
> >
> > It does here, so I can provide further debugging information if
> > needed.
>
> Thanks, can you provide the info I asked for?

Yes, albeit not right now.

> > On first glance, it appears that xgselect returns abnormally with
> > g_main_context acquired in one thread, and then other threads fail
> > to acquire it and loop endlessly.
>
> If you can describe what causes this to happen, I think we might be
> half-way to a solution.

Here's the backtrace of the abnormal exit I see with the patch attached:


(gdb) bt full
#0  0x00000000006bf987 in release_g_main_context (ptr=0xc1d070) at xgselect.c:36
        context = 0x7fffedf79710
#1  0x0000000000616f03 in do_one_unbind
    (this_binding=0x7fffedf79770, unwinding=true, bindflag=SET_INTERNAL_UNBIND)
    at eval.c:3446
#2  0x0000000000617245 in unbind_to (count=0, value=XIL(0)) at eval.c:3567
        this_binding = {
          kind = SPECPDL_UNWIND_PTR,
          unwind = {
            kind = SPECPDL_UNWIND_PTR,
            func = 0x6bf97b <release_g_main_context>,
            arg = XIL(0xc1d070),
            eval_depth = 0
          },
          unwind_array = {
            kind = SPECPDL_UNWIND_PTR,
            nelts = 7076219,
            array = 0xc1d070
          },
          unwind_ptr = {
            kind = SPECPDL_UNWIND_PTR,
            func = 0x6bf97b <release_g_main_context>,
            arg = 0xc1d070
          },
          unwind_int = {
            kind = SPECPDL_UNWIND_PTR,
            func = 0x6bf97b <release_g_main_context>,
            arg = 12701808
          },
          unwind_excursion = {
            kind = SPECPDL_UNWIND_PTR,
            marker = XIL(0x6bf97b),
            window = XIL(0xc1d070)
          },
          unwind_void = {
            kind = SPECPDL_UNWIND_PTR,
            func = 0x6bf97b <release_g_main_context>
          },
          let = {
            kind = SPECPDL_UNWIND_PTR,
            symbol = XIL(0x6bf97b),
            old_value = XIL(0xc1d070),
            where = XIL(0),
            saved_value = XIL(0xef26a0)
          },
          bt = {
            kind = SPECPDL_UNWIND_PTR,
            debug_on_exit = false,
            function = XIL(0x6bf97b),
            args = 0xc1d070,
            nargs = 0
          }
        }
        quitf = XIL(0)
#3  0x00000000006116df in unwind_to_catch
    (catch=0x7fffd8000c50, type=NONLOCAL_EXIT_SIGNAL, value=XIL(0x14d3653))
    at eval.c:1162
        last_time = false
#4  0x00000000006126d9 in signal_or_quit
    (error_symbol=XIL(0x90), data=XIL(0), keyboard_quit=false) at eval.c:1674
        unwind_data = XIL(0x14d3653)
        conditions = XIL(0x7ffff05d676b)
        string = XIL(0x5f5e77)
        real_error_symbol = XIL(0x90)
        clause = XIL(0x30)
        h = 0x7fffd8000c50
#5  0x00000000006122e9 in Fsignal (error_symbol=XIL(0x90), data=XIL(0))
    at eval.c:1564
#6  0x0000000000698901 in post_acquire_global_lock (self=0xe09db0) at
thread.c:115
        sym = XIL(0x90)
        data = XIL(0)
        prev_thread = 0xa745c0 <main_thread>
#7  0x000000000069892b in acquire_global_lock (self=0xe09db0) at thread.c:123
#8  0x0000000000699303 in really_call_select (arg=0x7fffedf79a70) at
thread.c:596
        sa = 0x7fffedf79a70
        self = 0xe09db0
        oldset = {
          __val = {0, 0, 7, 0, 80, 140736817269952, 2031, 2080,
18446744073709550952, 32, 343597383808, 4, 0, 472446402655,
511101108348, 0}
        }
#9  0x00000000005e5ee0 in flush_stack_call_func
    (func=0x699239 <really_call_select>, arg=0x7fffedf79a70) at alloc.c:4969
        end = 0x7fffedf79a30
        self = 0xe09db0
        sentry = {
          o = {
            __max_align_ll = 0,
            __max_align_ld = <invalid float value>
          }
        }
#10 0x0000000000699389 in thread_select
    (func=0x419320 <pselect@plt>, max_fds=9, rfds=0x7fffedf79fa0,
wfds=0x7fffedf79f20, efds=0x0, timeout=0x7fffedf7a260, sigmask=0x0) at
thread.c:616
        sa = {
          func = 0x419320 <pselect@plt>,
          max_fds = 9,
          rfds = 0x7fffedf79fa0,
          wfds = 0x7fffedf79f20,
          efds = 0x0,
          timeout = 0x7fffedf7a260,
          sigmask = 0x0,
          result = 1
        }
#11 0x00000000006bfef5 in xg_select
    (fds_lim=9, rfds=0x7fffedf7a300, wfds=0x7fffedf7a280, efds=0x0,
timeout=0x7fffedf7a260, sigmask=0x0) at xgselect.c:130
        all_rfds = {
          fds_bits = {8, 0 <repeats 15 times>}
        }
        all_wfds = {
          fds_bits = {0 <repeats 16 times>}
        }
        tmo = {
          tv_sec = 0,
          tv_nsec = 0
        }
        tmop = 0x7fffedf7a260
        context = 0xc1d070
        have_wfds = true
        gfds_buf = {{
            fd = 5,
            events = 1,
            revents = 0
          }, {
            fd = 6,
            events = 1,
            revents = 0
          }, {
            fd = 8,
            events = 1,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          } <repeats 125 times>}
        gfds = 0x7fffedf79b10
        gfds_size = 128
        n_gfds = 3
        retval = 0
        our_fds = 0
        max_fds = 8
        context_acquired = true
        i = 3
        nfds = 0
        tmo_in_millisec = -1
        must_free = 0
        need_to_dispatch = false
        count = 3
#12 0x000000000066b757 in wait_reading_process_output
    (time_limit=3, nsecs=0, read_kbd=0, do_display=false,
wait_for_cell=XIL(0), wait_proc=0x0, just_wait_proc=0) at
process.c:5423
        process_skipped = false
        channel = 0
        nfds = 0
        Available = {
          fds_bits = {8, 0 <repeats 15 times>}
        }
        Writeok = {
          fds_bits = {0 <repeats 16 times>}
        }
        check_write = true
        check_delay = 0
        no_avail = false
        xerrno = 0
        proc = XIL(0x7fffedf7a440)
        timeout = {
          tv_sec = 3,
          tv_nsec = 0
        }
        end_time = {
          tv_sec = 1562935633,
          tv_nsec = 911868453
        }
        timer_delay = {
          tv_sec = 0,
          tv_nsec = -1
        }
        got_output_end_time = {
          tv_sec = 0,
          tv_nsec = -1
        }
        wait = TIMEOUT
        got_some_output = -1
        prev_wait_proc_nbytes_read = 0
        retry_for_async = false
        count = 2
        now = {
          tv_sec = 0,
          tv_nsec = -1
        }
#13 0x0000000000429bf6 in Fsleep_for (seconds=make_fixnum(3),
milliseconds=XIL(0))
    at dispnew.c:5825
        t = {
          tv_sec = 3,
          tv_nsec = 0
        }
        tend = {
          tv_sec = 1562935633,
          tv_nsec = 911868112
        }
        duration = 3
#14 0x0000000000613e99 in eval_sub (form=XIL(0xf6df73)) at eval.c:2273
        i = 2
        maxargs = 2
        args_left = XIL(0)
        numargs = 1
        original_fun = XIL(0x7fffefa9fb98)
        original_args = XIL(0xf6df83)
        count = 1
        fun = XIL(0xa756a5)
        val = XIL(0)
        funcar = make_fixnum(35184372085343)
        argvals =
          {make_fixnum(3), XIL(0), XIL(0), XIL(0), XIL(0), XIL(0),
XIL(0), XIL(0)}
#15 0x0000000000610032 in Fprogn (body=XIL(0)) at eval.c:462
        form = XIL(0xf6df73)
        val = XIL(0)
#16 0x0000000000616102 in funcall_lambda
    (fun=XIL(0xf6da43), nargs=0, arg_vector=0xe09dd8) at eval.c:3065
        val = XIL(0xc0)
        syms_left = XIL(0)
        next = XIL(0x3400000013)
        lexenv = XIL(0)
        count = 1
        i = 0
        optional = false
        rest = false
#17 0x0000000000615542 in Ffuncall (nargs=1, args=0xe09dd0) at eval.c:2813
        fun = XIL(0xf6da43)
        original_fun = XIL(0xf6da43)
        funcar = XIL(0xc0)
        numargs = 0
        val = XIL(0xaf72e0)
        count = 0
#18 0x000000000069956f in invoke_thread_function () at thread.c:702
        count = 0
#19 0x0000000000611d61 in internal_condition_case
    (bfun=0x69953e <invoke_thread_function>, handlers=XIL(0x30),
hfun=0x699596 <record_thread_error>) at eval.c:1351
        val = make_fixnum(1405386)
        c = 0x7fffd8000c50
#20 0x0000000000699697 in run_thread (state=0xe09db0) at thread.c:741
        stack_pos = {
          __max_align_ll = 0,
          __max_align_ld = 0
        }
        self = 0xe09db0
        iter = 0x0
        c = 0x7fffd8000b20
#21 0x00007ffff4b38fa3 in start_thread (arg=<optimized out>)
    at pthread_create.c:486
        ret = <optimized out>
        pd = <optimized out>
        now = <optimized out>
        unwind_buf = {
          cancel_jmp_buf = {{
              jmp_buf = {140737185822464, -1249422724209328276,
140737488341374, 140737488341375, 140737185822464, 0,
1249453444682727276, 1249398985402204012},
              mask_was_saved = 0
            }},
          priv = {
            pad = {0x0, 0x0, 0x0, 0x0},
            data = {
              prev = 0x0,
              cleanup = 0x0,
              canceltype = 0
            }
          }
        }
        not_first_call = <optimized out>
#22 0x00007ffff49724cf in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Lisp Backtrace:
"sleep-for" (0xedf7a530)
0xf6da40 Lisp type 3

post_acquire_global_lock () can return abnormally (I didn't know
that), so really_call_select() can, too, so thread_select() can, too.

> > +  ptrdiff_t count = SPECPDL_INDEX ();
>
> I don't think we should do that at this low level.

You're right, it does stick out. I think we're safe because we're
calling Fsignal with the global lock held, but it's not a pretty or
well-documented situation.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]