[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

confusion over undocumented syntax-table features, font-lock and syntax-

From: Matthew Swift
Subject: confusion over undocumented syntax-table features, font-lock and syntax-tables
Date: Tue, 11 Feb 2003 00:08:20 -0500

This bug report will be sent to the Free Software Foundation,
not to your local site managers!
Please write in English, because the Emacs maintainers do not have
translators to read other languages for them.

Your bug report will be posted to the address@hidden mailing list,
and to the gnu.emacs.bug news group.

In GNU Emacs 21.2.1 (i386-debian-linux-gnu, X toolkit, Xaw3d scroll bars)
 of 2002-11-06 on beth, modified by Debian
configured using `configure  i386-debian-linux-gnu --prefix=/usr/local 
--sharedstatedir=/var/lib --libexecdir=/usr/local/lib --localstatedir=/var/lib 
--infodir=/usr/local/share/info --mandir=/usr/local/share/man --with-pop=yes 
--with-x=yes --with-x-toolkit=athena --without-gif'
Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: nil
  locale-coding-system: nil
  default-enable-multibyte-characters: t

Please describe exactly what actions triggered the bug
and the precise symptoms of the bug:

I was observing a strange behavior in `sh-mode' defined in sh-script.el where
(re-search-forward "\\s<\\s<") was failing even though it was passing over a
buffer substring of two characters whose syntax classes, as reported by
`(char-syntax (char-after N))' and N+1 was "<".

I have not figured out why that happens, and it may not be a bug, but in my
experiments, I have come across a barrel full of puzzles and questions.  I am
reporting as much as I have been able to distinguish.

The results of the following code completely baffles me.  Is
global-font-lock-mode changing the syntax classes?

-----cut here
    (setq test "
    hello () { echo world.; }
    ## boln is at buffer position 40
    (defun test ()
      (message "result is %S"
               (if (and 
                    (equal "<" (char-to-string (char-syntax ?#)))
                    (equal (char-after 40) ?#)
                    (equal (char-after 41) ?#)
                    (equal "<" (char-to-string (char-syntax (char-after 40))))
                    (equal "<" (char-to-string (char-syntax (char-after 41))))
                 (goto-char (point-min))
                 (re-search-forward "\\s<\\s<"))
      (global-font-lock-mode 0)
      ;; succeeds
      (global-font-lock-mode 1)
      ;; `re-search-forward' fails the SECOND time, if not the first (no
      ;; pattern found)

---- end of test file

The facility for matching chars in syntax descriptors is either not fully
documented or has some other problems.  Looking into it further would take more
time than I have at the moment.

sh-script.el says:

    (defvar sh-mode-syntax-table
      '((sh eval sh-mode-syntax-table ()
            ?\# "<"
            ?\n ">#"
            ?\" "\"\""
            ?\' "\"'"
            ?\` "\"`"
            ?! "_"
            ?% "_"
            ?: "_"
            ?. "_"
            ?^ "_"
            ?~ "_"
            ?< "."
            ?> ".")
        (csh eval identity sh)
        (rc eval identity sh))

      "Syntax-table used in Shell-Script mode.  See `sh-feature'.")

Consider the second entry in the table, which is the equivalent of

         (modify-syntax-entry ?\n ">#")

The documentation for syntax descriptors says (both in TeXinfo and in
functions' docstrings) that the second character, the matching character, is
"used" only when the syntax class is "(" or ")" (open or close parentheses).

The declaration above assigns a matching character to a character with the
endcomment syntax class.  The documentation does not say doing this is an
error.  But from here, all possibilities imply one or more problems.  (And I
should observe that it seems that, furthermore, several major modes assign
matching characters to chars in the string delimiter (") class (usually the
same one, e.g., " with " and ' with '); this usage is likewise problematic.)

If the declaration of ">#" is equivalent to ">", with respect to all Emacs
primitives and distributed Lisp code, then

   + sh-script.el should use simply ">" for clarity.

   It may be desirable to leave in a facility for assigning matching chars to
   non-paren classes, so that programmers can do something with it.  If so,
   brief mention should be made in the TeXinfo documentation, if not the
   docstrings.  If not, then

       + it should be documented that matching chars are ignored except
         for the "(" and ")" classes;

       + `modify-syntax-entry' should decline to install ignored matching chars
         by either signalling an error or by silently deleting the matching

       + `describe-syntax' should decline to report matching chars that do not
         have any significance, because reporting them is confusing
         (`describe-syntax' will report that ?\n matches ?#, and likewise if
         you assign matching chars to chars in other syntax classes for which
         matching seems irrelevant).

If the declaration of ">#" is not equivalent to ">", then either the behavior
is undefined or it is well-defined but not documented.  If it is undefined,
then sh-script.el should not be using it.  If it is undocumented, then it
should be documented.

Recent input:
M-x r e p o r t - e m a c s - b u g <return>

Recent messages:
1 <- require: gnus-group
1 -> require: gnus-start
1 <- require: gnus-start
1 -> require: gnus-util
1 <- require: gnus-util
Loading gnus-topic...done
Loading emacsbug...
1 -> require: sendmail
1 <- require: sendmail
Loading emacsbug...done

reply via email to

[Prev in Thread] Current Thread [Next in Thread]