bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally


From: João Távora
Subject: bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
Date: Thu, 10 Dec 2020 15:00:58 +0000
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)

Eli Zaretskii <eliz@gnu.org> writes:

> AFAICT, the only relevant call to sys_longjmp is in eval.c.  That is,
> if we think Emacs signals an error or otherwise throws to top-level.

I thought that, but now I'm confused.  I'm uncertain about possible,
different ways of "exiting non-locally" from a function, which I define
by (foo) running and (bar) never running in (progn (foo) (bar)).  When
that happens, (foo) has exited non-locally.

As far as I know, Elisp has no CL-style TAGBODY or GO, right?  So indeed
I would expect that throw/catch/signal things at the C-level are the
only possible responsibles for these situations.

>   break eval.c:NNNN
>   commands
>   > bt
>   > continue
>   > end
>
> (the ">" prompt is printed by GDB).  Then you will have a lot of
> backtraces, but only the last one will be relevant.  This simple
> method has a disadvantage that it slows down Emacs, and also produces
> a lot of possibly uninteresting stuff.

Thanks.  That's the "tracer" strategy I remember you telling me.  It was
useful in the past, not so much here.

>> 1. I have to find a way to set the unwind_to_catch() breakpoint
>>    conditional on some Elisp/near-elisp context, in this case something
>>    inside the Elisp function sly-net-send() or Fprocess_send_string.
>> 
>>    Do you think setting a silly global in Fprocess_send_string() and
>>    then checking that as the breakpoint condition would be a good idea?
>>    Where would I reset the flag?  Is there some C-version of
>>    "unwind-protect"?
>
> The C version of unwind-protect is record_unwind_protect.
>
> But I think it will be easier to use an existing variable that is
> usually not touched.  For example, you could piggy-back
> bidi-inhibit-bpa,

That's an excellent idea, and I've verified that it works.  But it
didn't help here.  Or rather, not in the way I had anticipated.  It did
help me determine that unwind_to_catch() doesn't seem to be the only
responsible for the non-local exit.

To be clear, I now have this that I put around the "suspicious" places:

   (cl-defmacro DEBUG-45117 ((message) &rest body)
     (declare (indent defun))
     (let ((var (cl-gensym)))
       `(let ((,var nil)
              (bidi-inhibit-bpa t)) ; for your conditional break trick
          (unwind-protect
              (prog1 (progn ,@body)
                (setq ,var t))
            (unless ,var
              (message ,message))))))

Here's how I use it in sly.el, in the code that's called from the idle
timer.

     (defun sly-net-send (sexp proc)
       "Send a SEXP to Lisp over the socket PROC.
     This is the lowest level of communication. The sexp will be READ and
     EVAL'd by Lisp."
       (DEBUG-45117 ("SOMETHING in SLY-NET-SEND bailed")
         (let* ((print-circle nil)
                (print-quoted nil)
                (payload (DEBUG-45117 ("ENCODE-CODING-STRING????")
                           (encode-coding-string
                            (concat (sly-prin1-to-string sexp) "\n")
                            'utf-8-unix)))
                (string (DEBUG-45117 ("LENGTH-ENCODING????")
                          (concat (sly-net-encode-length (length payload))
                                  payload))))
           (DEBUG-45117 ("PROCESS-SEND-STRING?????")
             (process-send-string proc string)))))

I then launch Emacs as I explained earlier:

   gdb -i=mi --args ~/Source/Emacs/emacs-27/src/emacs -Q   \
    -L ~/Source/Emacs/sly                                  \
    -l sly-autoloads                                       \
    -f sly                                                 \
    --eval "(setq eldoc-idle-delay 0.01)"                  \
    ~/Source/Emacs/sly/slynk/slynk.lisp                    

Then ensure that breakpoints looks more or less like this (a couple more
than the one you recommended there.)

    1       breakpoint     keep y   0x00005555557e2580 in 
terminate_due_to_signal at emacs.c:378
    2       breakpoint     keep y   0x000055555576f4f5 in x_error_quitter at 
xterm.c:10131
    3       breakpoint     keep y   0x00005555555aa32d in Fredraw_display at 
dispnew.c:3123
            breakpoint already hit 1 time
    6       breakpoint     keep y   0x0000555555966de5 in unwind_to_catch at 
eval.c:1178
            stop only if bidi_inhibit_bpa != 0
    7       breakpoint     keep y   0x000055555580b985 in 
quit_throw_to_read_char at keyboard.c:10970
            stop only if bidi_inhibit_bpa != 0
    10      breakpoint     keep y   0x0000555555963f1a in call_debugger at 
eval.c:283
            stop only if bidi_inhibit_bpa != 0

Then 'r' to run,  then start the debugging process I explained,
basically just scroll up and down in the slynk.lisp  file.  After a
while, in *Messages*, some of these start appearing.

     ENCODE-CODING-STRING????
     SOMETHING in SLY-NET-SEND bailed
     [sly] [issue#385] likely `process-send-string' exited non-locally from 
timer.

       ... more scrolling ... 

     SOMETHING in SLY-NET-SEND bailed
     [aly] [issue#385] likely `process-send-string' exited non-locally from 
timer. [2 times]


Note that ENCODE-CODING-STRING???? is missing from the second
observation!  In this last session I didn't capture the
"PROCESS-SEND-STRING???", but I'm pretty sure I have in the past.

It does seem though, that contrary to my original expectation, this is
not exclusive to process-send-string, but it happens in normal elisp
execution from quickly firing idle timers.

Anyway.

1. Shouldn't all of these have triggered the breakpoint??  I'm setting
   the Elisp/C variable in the macro.  I tested the technique
   separately.

2. Are we sure that no other mechanisms other than throw/catch/signal
   can trigger a non-local exit (that unwind-protect can still somehow
   catch?).

Thanks for any insight you may have,
João







reply via email to

[Prev in Thread] Current Thread [Next in Thread]