bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#6283: doc/lispref/searching.texi reference to octal code `0377' corr


From: MON KEY
Subject: bug#6283: doc/lispref/searching.texi reference to octal code `0377' correct?
Date: Mon, 31 May 2010 01:35:41 -0400

On Sat, May 29, 2010 at 2:45 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>
> It's not an Emacs convention to represent characters by their
> codepoints expressed in octal.  It's a widely accepted practice.  If
> we were to describe every convention in the world in the manual, 99%
> of the manual would be devoted to describing conventions.
>

That it is widely accepted practice is what makes it a convention.
Within Emacs lisp it also widely accepted practice to denote numeric
representations with #<radixN> notation. This is a conflict of
convention. The purpose of demarcating the use of a particular
convention in the stead of another is to clarify when one shall be
used with preference over another. It is unconventional for the manual
to use conflicting conventions without prejudice. This is my concern.

> Again, this part of the manual is not about how Emacs represents
> characters or reads them.  It's about their codes.

This is how I understood this portion of the manual.
Maybe I'm misunderstanding something fundamental about this distinction.

If this is so, I would greatly appreciate it if you could help me to
see it more clearly.

>> 0377 doesn't have a character that I'm aware of.
>
> In Unicode, it's a codepoint of LATIN SMALL LETTER Y WITH DIAERESIS.

I don't understand this.

>
> But the text says "...many non-ASCII characters have codes above octal
> 0377".  It doesn't talk about a specific character here, just about
> which codepoints are below it and which are above it.

Yes, but the regexp is "[\200-\377]".

>
> I didn't say that we are going to remove these features any time soon.
> Just that the manual doesn't talk too much about this, to avoid
> confusing users with issues that are both very complicated and very
> obscure, and are rarely if at all needed on the Lisp level.
>

I certainly agree they are confusing and easily misunderstood.
I disagree however that these issues are all that obscure.
You seem to suggest that the notation "octal 0NNN" is commonplace yet
i personally find this notation to be obscure.

tomato|potato <-> potato|tomato

>
> Of course.  But why do you expect to find the description of such
> abuse in the manual?
>

I _do_ find them whereas I don't find reference such w/re the 0377 convention.
This is, I guess, my concern.

Following is my attempt to come to grips with the distinction between
the numeric codepoint, integer character representations, reader
conventions etc. w/re the manual and particularly their use in
conjuction w/ regexps.  I believe this example illustrates some
reasonable familiarity with aspects of char/code representation.

But maybe this bit of code can help to show if is there something that
I am not getting???

;;; ================================================================

(let (chars-found frob-found)
  (with-temp-buffer
    (save-excursion
      (insert 10 255 10 ?\377 10 "\255" 10 4194221 10 "\377" 10 4194303))
    (while (search-forward-regexp "[\200-\377]" nil t)
      (let* ((md (match-data t))
             (md-char (char-before (cadr md))))
        (push `(,md-char ,(car md) ,(cadr md)) chars-found))))
  (setq chars-found (nreverse chars-found))
  (dolist (cf chars-found
              (setq chars-found
                    `(,(setq frob-found (nreverse frob-found))
                      ,chars-found)))
    (push (car (read-from-string (format "#o%o" (car cf)))) frob-found))
  (setq frob-found nil)
  (dolist (ints (car chars-found)
                (setq chars-found
                      `(,(setq frob-found (nreverse frob-found))
                        ,@chars-found)))
    (push `(,ints . ,(char-to-string ints)) frob-found))
  (setq frob-found nil)
  (dolist (d (car chars-found)
             (setq chars-found
                   `(,(setq frob-found (nreverse frob-found)) ,@chars-found)))
    (let* ((mltb-int (car d))
           (unib-str (cdr d))
           (unib-str->mchar (string-to-char (symbol-name (read unib-str))))
           (mltb-int->uchar (multibyte-char-to-unibyte mltb-int)))
      (push `(:mltb-int ,mltb-int
                        :unib-str ,unib-str
                        :unib-str->mchar ,unib-str->mchar
                        :mltb-int->uchar ,mltb-int->uchar)
            frob-found)))
  (insert 10 (make-string 68 59) 10
          ";; With this regexp:" 10
          ";; \(search-forward-regexp \"[\\200-\\377]\" nil t\)" 10
          ";; Matched these chars:" 10
          255 10 ?\377 10 "\255" 10 4194221 10 "\377" 10 4194303 10
          (make-string 68 59) 10)
  (pp chars-found (current-buffer))
  (insert (make-string 68 59) "\n")
  (let ((cnt 0))
    (dolist (pl (car chars-found))
      (setq cnt (1+ cnt))
      (insert
       10 (make-string 68 59) 10
       (format
        (concat
         ";; :MATCH-DATA-#%d\n"
         "\n(char-to-string (unibyte-char-to-multibyte %d)) ;<-\"%c%d\"\n"
         "\n(insert (char-to-string (unibyte-char-to-multibyte %d)))
;<- multibyte-char\n"
         "\n(insert (identity %S)) ;<- raw-byte\n"
         "\n(insert (string-to-char (identity %S))) ;<- multibyte-char\n"
         "\n(insert-byte %d 1) ;<-raw-byte unibyte-char\n"
         "\n(insert (format \"(insert (identity #o%%o))\"
(unibyte-char-to-multibyte %d)))\n")
        cnt
        (plist-get pl :mltb-int->uchar)
        92
        (string-to-number (format "%o" (plist-get pl :mltb-int->uchar)))
        (plist-get pl :mltb-int->uchar)
        (plist-get pl :unib-str)
        (plist-get pl :unib-str)
        (plist-get pl :mltb-int->uchar)
        (plist-get pl :mltb-int->uchar))))))

;;; ================================================================

--
/s_P\





reply via email to

[Prev in Thread] Current Thread [Next in Thread]