bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's

From:	Gemini Lasswell
Subject:	bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
Date:	Fri, 19 Oct 2018 12:32:32 -0700
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/26.1.50 (gnu/linux)

Gemini Lasswell <gazally@runbox.com> writes:

> I set up a single-threaded situation where I could redefine a function
> while exec_byte_code was running it, and got a segfault.  I've gained
> some insights from debugging this version of the bug which I will put
> into a separate email.

Here's a gdb transcript going through the single-threaded version of
this bug.  In this transcript I use a file 'repro.el' which I've
attached to the end of this message, and is the same as the one in my
last message.

Start gdb with a breakpoint at Fredraw_display:

$ gdb --args ./emacs -Q
...
(gdb) b Fredraw_display
(gdb) r

In Emacs, find the file repro.el and load it with byte-compile-file,
then go back to *scratch* and run my-loop:

  C-x C-f repro.el RET
  C-u M-x byte-compile-file RET repro.el RET
  C-x b RET
  M-x my-loop RET

This gets me to the gdb prompt, at a point in execution where the next
function called will be my-loop-1, so I set a breakpoint in
funcall_lambda, where I can see the bytecode object for my-loop-1 (I
edited out the bytestring):

  Thread 1 "emacs" hit Breakpoint 3, Fredraw_display () at dispnew.c:3027
  3027  {
  (gdb) br funcall_lambda
  Breakpoint 4 at 0x5cdb00: file eval.c, line 3016.
  (gdb) c
  Continuing.

  Thread 1 "emacs" hit Breakpoint 4, funcall_lambda (fun=XIL(0x31c0235),
      nargs=nargs@entry=0, arg_vector=arg_vector@entry=0x7fffffff01c0)
      at eval.c:3016
  3016  {
  (gdb) clear
  Deleted breakpoint 4
  (gdb) p fun
  $1 = XIL(0x1630fc5)
  (gdb) pr
  #[0 "..." [my-var 0 "Now in recursive edit
  " recursive-edit format "Leaving recursive edit: %s
  " (a b c d e) message "foo: %s" last 1 "bar: %s" 2 "baz: %s" "bop: %s" mod 3] 
6]

Then I skip ahead into exec-byte-code:

  (gdb) br exec_byte_code
  Breakpoint 5 at 0x611bb0: file bytecode.c, line 342.
  (gdb) c
  Continuing.

  Thread 1 "emacs" hit Breakpoint 5, exec_byte_code (bytestr=XIL(0x3571d24),
      vector=XIL(0x31c0195), maxdepth=make_number(4),
      args_template=args_template@entry=XIL(0), nargs=nargs@entry=0,
      args=args@entry=0x0) at bytecode.c:342
  342   {

Here's what's in the register $rbp, and the constants vector:

  (gdb) clear
  Deleted breakpoint 5
  (gdb) p $rbp
  $2 = (void *) 0xb0201
  (gdb) pr
  #<INVALID_LISP_OBJECT 0x000b0201>
  (gdb) p vector
  $3 = XIL(0x1630f35)
  (gdb) pr
  [my-var 0 "Now in recursive edit
  " recursive-edit format "Leaving recursive edit: %s
  " (a b c d e) message "foo: %s" last 1 "bar: %s" 2 "baz: %s" "bop: %s" mod 3]

Skip ahead, to get to where exec_byte_code has a value for vectorp:

  (gdb) n 12
  366     USE_SAFE_ALLOCA;
  (gdb) p vectorp
  $4 = (Lisp_Object *) 0x1630f38 <bss_sbrk_buffer+9164248>
  (gdb) p *vectorp
  $5 = XIL(0x2327d80)
  (gdb) pr
  my-var
  (gdb) break mark_vectorlike if ptr->contents == $4
  Breakpoint 6 at 0x5ad400: file alloc.c, line 6036.
  (gdb) c
  Continuing.

The idea is to break when garbage collection finds the constants vector.
(I first tried setting a conditional breakpoint in mark_object, which
made garbage collection either hang or take more time than I had
patience for.)

In Emacs type C-x b RET.  This causes a gc and a breakpoint hit:

  Thread 1 "emacs" hit Breakpoint 6, mark_vectorlike (ptr=0x31c0190) at 
alloc.c:6036
  6036    eassert (!VECTOR_MARKED_P (ptr));

  (gdb) bt 20
  #0  mark_vectorlike (ptr=0x1630f30 <bss_sbrk_buffer+9164240>) at alloc.c:6036
  #1  0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
  #2  0x00000000005ad45e in mark_vectorlike (
      ptr=0x1611fd0 <bss_sbrk_buffer+9037424>) at alloc.c:6046
  #3  0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
  #4  0x00000000005acdf4 in mark_object (arg=...) at alloc.c:6477
  #5  0x00000000005acae4 in mark_object (arg=...) at alloc.c:6434
  #6  0x00000000005ad45e in mark_vectorlike (
      ptr=0x15a8e00 <bss_sbrk_buffer+8606880>) at alloc.c:6046
  #7  0x00000000005ad45e in mark_vectorlike (
      ptr=0x15a9c30 <bss_sbrk_buffer+8610512>) at alloc.c:6046
  #8  0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
  #9  0x00000000005ad45e in mark_vectorlike (
      ptr=0x15a7c30 <bss_sbrk_buffer+8602320>) at alloc.c:6046
  #10 0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
  #11 0x00000000005ad45e in mark_vectorlike (
      ptr=0x15a6e80 <bss_sbrk_buffer+8598816>) at alloc.c:6046
  #12 0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
  #13 0x00000000005acdf4 in mark_object (arg=...) at alloc.c:6477
  #14 0x00000000005acaa5 in mark_object (arg=...) at alloc.c:6431
  #15 0x00000000005ad45e in mark_vectorlike (
      ptr=0x15fbed0 <bss_sbrk_buffer+8947056>) at alloc.c:6046
  #16 0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
  #17 0x00000000005ad45e in mark_vectorlike (
      ptr=0x15fbf50 <bss_sbrk_buffer+8947184>) at alloc.c:6046
  #18 0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
  #19 0x00000000005ad45e in mark_vectorlike (
      ptr=0x15fcc80 <bss_sbrk_buffer+8950560>) at alloc.c:6046
  (More stack frames follow...)

  Lisp Backtrace:
  "Automatic GC" (0x0)
  "eldoc-pre-command-refresh-echo-area" (0xfffefbb0)
  "recursive-edit" (0xfffeffd8)
  "my-loop-1" (0xffff0250)
  "my-loop" (0xffff0650)
  "funcall-interactively" (0xffff0648)
  "call-interactively" (0xffff07d0)
  "command-execute" (0xffff0ab8)
  "execute-extended-command" (0xffff0ea0)
  "funcall-interactively" (0xffff0e98)
  "call-interactively" (0xffff11d0)
  "command-execute" (0xffff1488)

There are 279 frames in the backtrace, and mark_stack and mark_memory
aren't there.  So I'm guessing the constants vector is getting found via
the function definition of 'my-loop-1'.  Keep going:

  (gdb) c
  Continuing.

Now in Emacs do this:
  M-x eval-buffer RET
  C-x b RET
  M-x my-gc RET

Execution does not stop at the breakpoint.  In Emacs type C-M-c.
Result:

  Thread 1 "emacs" received signal SIGSEGV, Segmentation fault.
  0x00000000005bca1b in styled_format (nargs=2, args=0x7ffffffeffd8,
      message=<optimized out>) at editfns.c:3129
  3129        unsigned char format_char = *format++;

What's happened to the constants vector and its contents?

  (gdb) p $3
  $6 = XIL(0x1630f35)
  (gdb) pr
  #<INVALID_LISP_OBJECT 0x01630f35>
  (gdb) p *$4
  $7 = XIL(0x2327d80)
  (gdb) pr
  my-var
  (gdb) p *($4+5)
  $8 = XIL(0x359a6f4)
  (gdb) pr
  #<INVALID_LISP_OBJECT 0x0359a6f4>
  (gdb) p *($4+4)
  $9 = XIL(0x6390)
  (gdb) pr
  format

Looks like the constants vector was freed, and its contents haven't been
overwritten (yet) but the format string has been freed leading to the
crash in styled_format.

While I was developing this method of reproducing this bug, I went
through this exercise without lexical-binding set in repro.el.  In that
version, the register $rbp when exec_byte_code is called contains the
bytecode Lisp_Object (instead of the non-Lisp-object value it contains
in the transcript above), and the first thing exec_byte_code does is
save it on the stack (presumably because the System V AMD64 ABI calling
convention says that called functions which use $rbp should save and
restore it).

Here's the beginning of the disassembly of exec_byte_code from
  "objdump -S bytecode.o":

  0000000000000020 <exec_byte_code>:
     executing BYTESTR.  */

  Lisp_Object
  exec_byte_code (Lisp_Object bytestr, Lisp_Object vector, Lisp_Object maxdepth,
                Lisp_Object args_template, ptrdiff_t nargs, Lisp_Object *args)
  {
        20:     55                      push   %rbp
        21:     48 89 e5                mov    %rsp,%rbp
        24:     41 57                   push   %r15
        26:     41 56                   push   %r14
        28:     41 55                   push   %r13
        2a:     41 54                   push   %r12
        2c:     49 89 ce                mov    %rcx,%r14
        2f:     53                      push   %rbx

So in the non-lexical-binding case the bytecode Lisp_Object is written
to the stack by the first instruction in exec_byte_code, and then during
the execution of 'my-gc' the breakpoint in mark_vectorlike stops at a
point with a much shorter backtrace which includes mark_stack and
mark_memory, and mark_memory's pp is pointing to the location on the
stack where $rbp was written. The bytecode object and constants vector
are consequently not freed, and no segfault happens.

I don't follow everything going on in the disassembly of funcall_lambda,
but I did figure out (by comparison with a debug session in the
multithreaded situation) that the different values in $rbp when
funcall_lambda calls exec_byte_code depend on the different code paths
following the test of whether the first element of the bytecode object
vector (the "args template" as funcall_lambda's comment calls it) is an
integer, which in turn depends on whether my-loop-1 was compiled with
lexical-binding on.

Here is 'repro.el':

;;;  -*- lexical-binding: t -*-
(defvar my-var "ok")
(defun my-loop-1 ()
  (let ((val 0))
    (while t
      (insert "Now in recursive edit\n")
      (recursive-edit)
      (insert (format "Leaving recursive edit: %s\n" my-var))
      (let ((things '(a b c d e)))
        (cond ;
         ((= val 0) (message "foo: %s" (last things)))
         ((= val 1) (message "bar: %s" things))
         ((= val 2) (message "baz: %s" (car things)))
         (t (message "bop: %s" (nth 2 things))))
        (setq val (mod (1+ val) 3))))))

(defun my-loop ()
  (interactive)
  (redraw-display)
  (my-loop-1))

(defun my-gc-1 ()
  (garbage-collect))

(defun my-gc ()
  (interactive)
  (my-gc-1))

(provide 'repro)

[Prev in Thread]

Current Thread

[Next in Thread]

bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function, (continued)
- bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function, Paul Eggert, 2018/10/31
  - bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function, Eli Zaretskii, 2018/10/31

Prev by Date: bug#32932: 27.0.50; render bugs on macOS Mojave
Next by Date: bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
Previous by thread: bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
Next by thread: bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
Index(es):
- Date
- Thread