bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#38748: 28.0.50; crash on MacOS 10.15.2


From: Robert Pluim
Subject: bug#38748: 28.0.50; crash on MacOS 10.15.2
Date: Fri, 10 Jan 2020 09:58:52 +0100

>>>>> On Fri, 10 Jan 2020 10:27:45 +0200, Eli Zaretskii <address@hidden> said:

    >> From: Pip Cet <address@hidden>
    >> Date: Fri, 10 Jan 2020 07:32:07 +0000
    >> Cc: address@hidden, address@hidden, address@hidden, 
    >> address@hidden, address@hidden
    >> 
    >> > The backtrace shows a very recursive GC, it doesn't show any other
    >> > function being deeply recursive.  So I'm not sure I understand what
    >> > tail-recursive function did you have in mind.  Can you elaborate?
    >> 
    >> I can. I think we're looking at two bugs: the first is the simple
    >> use-after-free of XFRAME (frame)->output_data.ns where `frame' is a
    >> dead frame. I've confirmed on GNU/Linux that mark_frame is called for
    >> a frame for which x_free_frame_resources has already been called, if
    >> there's a global variable still referencing the frame. I think the
    >> same thing happens on macOS.

    Eli> This one doesn't depend on the 'ok's initialization in
    Eli> face_inherited_attr in any way, does it?

No, it doesnʼt.

    >> 1. I think face_inherited_attr is being optimized to tail-call itself
    >> rather than calling itself in a new stack frame; thus, it loops
    >> indefinitely for a faulty face setup which would otherwise lead to an
    >> immediate crash.
    >> 1b. that optimization only works without the harmless initialization of 
"ok".
    >> 
    >> 2. Our initial face setup is faulty in the sense above.
    >> 
    >> 3. Something happens on a secondary thread which causes our face setup
    >> to become non-faulty, possibly during GC.

    Eli> What do you mean by "secondary thread"?  And how can GC modify Lisp
    Eli> data structures? that'd be a terrible bug.

    Eli> In any case, the full backtrace shows no trace of face_inherited_attr
    Eli> call anywhere in the callstack, so if there is indeed infinite
    Eli> recursion in that function, it was somehow exited long ago by the time
    Eli> GC runs.

    Eli> As for the tail-recursion part: do you see any sign of that in the
    Eli> disassembly posted by Robert?  I didn't, but maybe I missed
    Eli> something.  And such subtleties should only rear their ugly heads in
    Eli> optimized code, whereas we already know that an unoptimized build
    Eli> crashes in the same way.

Iʼm attaching the disassembly of face_inherited_attr with -O2, with
and without the change to 'ok'. I canʼt see any tail recursion, and
modulo the use of r14 rather than r13, the only change I can see is
right at the end, where the return value is set up (disclaimer: Iʼm
not fluent in x86 assembler).

    Eli> I still think the shortest way to finding the culprit here is to
    Eli> patiently and painfully go over the last_marked array, deciphering
    Eli> the Lisp object we marked, until we succeed in identifying the Lisp
    Eli> data structure which got corrupted.  Once we succeed in identifying
    Eli> that data structure, it should be relatively easy to find who and
    Eli> where corrupts it.  This may mean a lot of inconvenient drudgery,
    Eli> exacerbated by the fact that having a functional GDB on macOS is not
    Eli> easy, but I don't think we have a better way at this point.

Itʼs possible that there is only one bug. The emacs Iʼve been using
with the change in nsterm.m suggested by Pip has been completely
stable. If it does crash again I can trawl through last_marked.

Robert

Attachment: unmodified-optimized.txt
Description: Text document

Attachment: modified-optimized.txt
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]