bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#20862: 25.0.50; 32-bit Emacs configured --with-wide-int miscompiles


From: Eli Zaretskii
Subject: bug#20862: 25.0.50; 32-bit Emacs configured --with-wide-int miscompiles CL
Date: Thu, 25 Jun 2015 17:30:16 +0300

> Date: Wed, 24 Jun 2015 20:31:10 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: 20862@debbugs.gnu.org
> 
> Thanks for reporting that. It appears to be a bug in the garbage collector, 
> and is likely to be hard to reproduce. I couldn't reproduce it, but I did a 
> 'make bootstrap' on Fedora x86-64 (configured with --with-wide-int and 
> compiled with gcc -m32 so it's really x86), and got a core dump in a 
> completely different area that (of course!) went away when I compiled without 
> optimization.

Do you remember which Lisp file was being compiled when you got a core
dump?

I had something similar while trying to debug this: crashes while
compiling cedet/srecode/proj-obj.el and sometimes also ibuffer.el.
And yes, it's a very elusive crash: it only happens during a full
bootstrap, and even repeating the exact same bootstrap in a different
directory makes it disappear!  But I was luckier than you, in that I
did succeed reproducing these crashes in an unoptimized build.  By
slowly and painfully tracking these crashes, I found out that they are
due to an un-interned symbol whose name is "THIS", created by this
form:

  (let ((print-gensym nil) (print-quoted t)) (format "%S" (cons 'fn 
(cl--make-usage-args orig-args))))

That symbol gets GC'ed (I see "DEAD" in its function cell) while it's
still alive, and then Emacs crashes trying to print that symbol's name
in the call to 'format', because GC recycles that symbol's name and
replaces it with a NULL pointer.  So your analysis:

> Rather than try to debug it directly, I thought about what might have caused 
> the problem, and re-audited the garbage collector with the recent Qnil==0 
> changes in mind. This did uncover a bug, and the attached patch (which we 
> will need anyway) allowed me to do a "make bootstrap" successfully in the 
> same configuration.

and the changes in the patch related to symbols-as-offsets and their
alignment on the stack, make perfect sense to me, because they explain
how come stack marking didn't mark this symbol, and thus allowed it to
be GC'ed.

> I installed this into the master as commit 
> 93f4f67ba93b78e8b31e498e8ce7bce4c8298b76; please give it a try in your setup 
> when you have the time. 

I did, and the crashes are gone, thanks.  The cl-lib-tests also
succeed.

However, there still seems to be some subtle problem, because the
byte-compiled files don't all compare equal.  (I've seen the same
problem before your patches as well.)  I used the following command to
find the *.elc files that are different:

 diff -r -a -u -I"in Emacs version 25\.0\.50" ./lisp ../int32/lisp 
--exclude="*.el" --exclude="*.el~" | grep -a "^diff "

where "../int32/" is the directory where I built Emacs without
"--with-wide-int".  This reveals differences in the following files:

 cedet/semantic/texi.elc   cedet/semantic/util.elc
 cedet/srecode/srt-wy.elc  emacs-lisp/cl-generic.elc

Some of the differences are insignificant (different label numbers
used, or different file offsets due to a longer Emacs version string),
but others seem to be significant.  For example, the byte code of
cl--generic-struct-tag in cl-generic.elc has a few different bytes.
Likewise with the byte code of semantic-texi-expand-tag in texi.elc,
of semantic-something-to-tag-table in util.elc, and of
srecode-template-wy--parse-table in srt-wy.elc.

The list of *.elc files that differ appears to depend on optimization
level: the above list was obtained with -O0; compiling with -O1 leaves
only cl-generic.elc and srt-wy.elc different, and compiling with -O2
brings util.elc back, and also adds differences in
eshell/esh-proc.elc.  Or maybe the actual factor is the specific order
in which the files are compiled (I bootstrap with "make -j8"), which
determines which other Lisp files are available as *.el or *.elc,
because bootstrapping without parallel Make execution again leaves
only cl-generic.elc and srt-wy.elc?

Do you see something similar on your system?  How to go about
debugging this?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]