[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's
From: |
Gemini Lasswell |
Subject: |
bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function |
Date: |
Fri, 19 Oct 2018 12:32:32 -0700 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/26.1.50 (gnu/linux) |
Gemini Lasswell <gazally@runbox.com> writes:
> I set up a single-threaded situation where I could redefine a function
> while exec_byte_code was running it, and got a segfault. I've gained
> some insights from debugging this version of the bug which I will put
> into a separate email.
Here's a gdb transcript going through the single-threaded version of
this bug. In this transcript I use a file 'repro.el' which I've
attached to the end of this message, and is the same as the one in my
last message.
Start gdb with a breakpoint at Fredraw_display:
$ gdb --args ./emacs -Q
...
(gdb) b Fredraw_display
(gdb) r
In Emacs, find the file repro.el and load it with byte-compile-file,
then go back to *scratch* and run my-loop:
C-x C-f repro.el RET
C-u M-x byte-compile-file RET repro.el RET
C-x b RET
M-x my-loop RET
This gets me to the gdb prompt, at a point in execution where the next
function called will be my-loop-1, so I set a breakpoint in
funcall_lambda, where I can see the bytecode object for my-loop-1 (I
edited out the bytestring):
Thread 1 "emacs" hit Breakpoint 3, Fredraw_display () at dispnew.c:3027
3027 {
(gdb) br funcall_lambda
Breakpoint 4 at 0x5cdb00: file eval.c, line 3016.
(gdb) c
Continuing.
Thread 1 "emacs" hit Breakpoint 4, funcall_lambda (fun=XIL(0x31c0235),
nargs=nargs@entry=0, arg_vector=arg_vector@entry=0x7fffffff01c0)
at eval.c:3016
3016 {
(gdb) clear
Deleted breakpoint 4
(gdb) p fun
$1 = XIL(0x1630fc5)
(gdb) pr
#[0 "..." [my-var 0 "Now in recursive edit
" recursive-edit format "Leaving recursive edit: %s
" (a b c d e) message "foo: %s" last 1 "bar: %s" 2 "baz: %s" "bop: %s" mod 3]
6]
Then I skip ahead into exec-byte-code:
(gdb) br exec_byte_code
Breakpoint 5 at 0x611bb0: file bytecode.c, line 342.
(gdb) c
Continuing.
Thread 1 "emacs" hit Breakpoint 5, exec_byte_code (bytestr=XIL(0x3571d24),
vector=XIL(0x31c0195), maxdepth=make_number(4),
args_template=args_template@entry=XIL(0), nargs=nargs@entry=0,
args=args@entry=0x0) at bytecode.c:342
342 {
Here's what's in the register $rbp, and the constants vector:
(gdb) clear
Deleted breakpoint 5
(gdb) p $rbp
$2 = (void *) 0xb0201
(gdb) pr
#<INVALID_LISP_OBJECT 0x000b0201>
(gdb) p vector
$3 = XIL(0x1630f35)
(gdb) pr
[my-var 0 "Now in recursive edit
" recursive-edit format "Leaving recursive edit: %s
" (a b c d e) message "foo: %s" last 1 "bar: %s" 2 "baz: %s" "bop: %s" mod 3]
Skip ahead, to get to where exec_byte_code has a value for vectorp:
(gdb) n 12
366 USE_SAFE_ALLOCA;
(gdb) p vectorp
$4 = (Lisp_Object *) 0x1630f38 <bss_sbrk_buffer+9164248>
(gdb) p *vectorp
$5 = XIL(0x2327d80)
(gdb) pr
my-var
(gdb) break mark_vectorlike if ptr->contents == $4
Breakpoint 6 at 0x5ad400: file alloc.c, line 6036.
(gdb) c
Continuing.
The idea is to break when garbage collection finds the constants vector.
(I first tried setting a conditional breakpoint in mark_object, which
made garbage collection either hang or take more time than I had
patience for.)
In Emacs type C-x b RET. This causes a gc and a breakpoint hit:
Thread 1 "emacs" hit Breakpoint 6, mark_vectorlike (ptr=0x31c0190) at
alloc.c:6036
6036 eassert (!VECTOR_MARKED_P (ptr));
(gdb) bt 20
#0 mark_vectorlike (ptr=0x1630f30 <bss_sbrk_buffer+9164240>) at alloc.c:6036
#1 0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
#2 0x00000000005ad45e in mark_vectorlike (
ptr=0x1611fd0 <bss_sbrk_buffer+9037424>) at alloc.c:6046
#3 0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
#4 0x00000000005acdf4 in mark_object (arg=...) at alloc.c:6477
#5 0x00000000005acae4 in mark_object (arg=...) at alloc.c:6434
#6 0x00000000005ad45e in mark_vectorlike (
ptr=0x15a8e00 <bss_sbrk_buffer+8606880>) at alloc.c:6046
#7 0x00000000005ad45e in mark_vectorlike (
ptr=0x15a9c30 <bss_sbrk_buffer+8610512>) at alloc.c:6046
#8 0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
#9 0x00000000005ad45e in mark_vectorlike (
ptr=0x15a7c30 <bss_sbrk_buffer+8602320>) at alloc.c:6046
#10 0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
#11 0x00000000005ad45e in mark_vectorlike (
ptr=0x15a6e80 <bss_sbrk_buffer+8598816>) at alloc.c:6046
#12 0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
#13 0x00000000005acdf4 in mark_object (arg=...) at alloc.c:6477
#14 0x00000000005acaa5 in mark_object (arg=...) at alloc.c:6431
#15 0x00000000005ad45e in mark_vectorlike (
ptr=0x15fbed0 <bss_sbrk_buffer+8947056>) at alloc.c:6046
#16 0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
#17 0x00000000005ad45e in mark_vectorlike (
ptr=0x15fbf50 <bss_sbrk_buffer+8947184>) at alloc.c:6046
#18 0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
#19 0x00000000005ad45e in mark_vectorlike (
ptr=0x15fcc80 <bss_sbrk_buffer+8950560>) at alloc.c:6046
(More stack frames follow...)
Lisp Backtrace:
"Automatic GC" (0x0)
"eldoc-pre-command-refresh-echo-area" (0xfffefbb0)
"recursive-edit" (0xfffeffd8)
"my-loop-1" (0xffff0250)
"my-loop" (0xffff0650)
"funcall-interactively" (0xffff0648)
"call-interactively" (0xffff07d0)
"command-execute" (0xffff0ab8)
"execute-extended-command" (0xffff0ea0)
"funcall-interactively" (0xffff0e98)
"call-interactively" (0xffff11d0)
"command-execute" (0xffff1488)
There are 279 frames in the backtrace, and mark_stack and mark_memory
aren't there. So I'm guessing the constants vector is getting found via
the function definition of 'my-loop-1'. Keep going:
(gdb) c
Continuing.
Now in Emacs do this:
M-x eval-buffer RET
C-x b RET
M-x my-gc RET
Execution does not stop at the breakpoint. In Emacs type C-M-c.
Result:
Thread 1 "emacs" received signal SIGSEGV, Segmentation fault.
0x00000000005bca1b in styled_format (nargs=2, args=0x7ffffffeffd8,
message=<optimized out>) at editfns.c:3129
3129 unsigned char format_char = *format++;
What's happened to the constants vector and its contents?
(gdb) p $3
$6 = XIL(0x1630f35)
(gdb) pr
#<INVALID_LISP_OBJECT 0x01630f35>
(gdb) p *$4
$7 = XIL(0x2327d80)
(gdb) pr
my-var
(gdb) p *($4+5)
$8 = XIL(0x359a6f4)
(gdb) pr
#<INVALID_LISP_OBJECT 0x0359a6f4>
(gdb) p *($4+4)
$9 = XIL(0x6390)
(gdb) pr
format
Looks like the constants vector was freed, and its contents haven't been
overwritten (yet) but the format string has been freed leading to the
crash in styled_format.
While I was developing this method of reproducing this bug, I went
through this exercise without lexical-binding set in repro.el. In that
version, the register $rbp when exec_byte_code is called contains the
bytecode Lisp_Object (instead of the non-Lisp-object value it contains
in the transcript above), and the first thing exec_byte_code does is
save it on the stack (presumably because the System V AMD64 ABI calling
convention says that called functions which use $rbp should save and
restore it).
Here's the beginning of the disassembly of exec_byte_code from
"objdump -S bytecode.o":
0000000000000020 <exec_byte_code>:
executing BYTESTR. */
Lisp_Object
exec_byte_code (Lisp_Object bytestr, Lisp_Object vector, Lisp_Object maxdepth,
Lisp_Object args_template, ptrdiff_t nargs, Lisp_Object *args)
{
20: 55 push %rbp
21: 48 89 e5 mov %rsp,%rbp
24: 41 57 push %r15
26: 41 56 push %r14
28: 41 55 push %r13
2a: 41 54 push %r12
2c: 49 89 ce mov %rcx,%r14
2f: 53 push %rbx
So in the non-lexical-binding case the bytecode Lisp_Object is written
to the stack by the first instruction in exec_byte_code, and then during
the execution of 'my-gc' the breakpoint in mark_vectorlike stops at a
point with a much shorter backtrace which includes mark_stack and
mark_memory, and mark_memory's pp is pointing to the location on the
stack where $rbp was written. The bytecode object and constants vector
are consequently not freed, and no segfault happens.
I don't follow everything going on in the disassembly of funcall_lambda,
but I did figure out (by comparison with a debug session in the
multithreaded situation) that the different values in $rbp when
funcall_lambda calls exec_byte_code depend on the different code paths
following the test of whether the first element of the bytecode object
vector (the "args template" as funcall_lambda's comment calls it) is an
integer, which in turn depends on whether my-loop-1 was compiled with
lexical-binding on.
Here is 'repro.el':
;;; -*- lexical-binding: t -*-
(defvar my-var "ok")
(defun my-loop-1 ()
(let ((val 0))
(while t
(insert "Now in recursive edit\n")
(recursive-edit)
(insert (format "Leaving recursive edit: %s\n" my-var))
(let ((things '(a b c d e)))
(cond ;
((= val 0) (message "foo: %s" (last things)))
((= val 1) (message "bar: %s" things))
((= val 2) (message "baz: %s" (car things)))
(t (message "bop: %s" (nth 2 things))))
(setq val (mod (1+ val) 3))))))
(defun my-loop ()
(interactive)
(redraw-display)
(my-loop-1))
(defun my-gc-1 ()
(garbage-collect))
(defun my-gc ()
(interactive)
(my-gc-1))
(provide 'repro)
- bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function, (continued)
- bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function, Eli Zaretskii, 2018/10/16
- bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function, Gemini Lasswell, 2018/10/18
- bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function, Eli Zaretskii, 2018/10/19
- bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function, Gemini Lasswell, 2018/10/19
- bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function, Eli Zaretskii, 2018/10/20
- bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function, Andreas Schwab, 2018/10/20
- bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function, Eli Zaretskii, 2018/10/20
- bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function, Andreas Schwab, 2018/10/20
- bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function, Gemini Lasswell, 2018/10/29
- bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function, Eli Zaretskii, 2018/10/29
- bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function,
Gemini Lasswell <=
- bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function, Eli Zaretskii, 2018/10/17
- bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function, Gemini Lasswell, 2018/10/17
- bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function, Eli Zaretskii, 2018/10/18
- bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function, Gemini Lasswell, 2018/10/18
- bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function, Eli Zaretskii, 2018/10/19
- bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function, Stefan Monnier, 2018/10/29
bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function, Paul Eggert, 2018/10/31