emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Several serious problems


From: Kenichi Handa
Subject: Re: Several serious problems
Date: Tue, 23 Jul 2002 22:35:46 +0900 (JST)

In article <address@hidden>, Richard Stallman <address@hidden> writes:

> I cannot save the file lisp/ChangeLog.  It specifies coding system
> iso-2022-7bit, but it contains something that cannot be encoded in that
> coding system.  

It seem that this problem was already fixed.  As I also
found one unnecessary mule-unicode-0100-24ff char, I deleted
it.

> I don't know any way to find the text that causes the
> problem; essentially I am helpless.

At least, (find-charset-region 1 (point-max)) will give you
some information.  If the returned value contains a
suspicious charset, we can search it (if it's not
eight-bit-xxx) by:
        (re-search-forward "[%c-%c]"
                           (make-char CHARSET 32 32) 
                           (make-char CHARSET 127 127))
To search for eight-bit-control:
        (re-search-forward "[\200-\237]")
To search for eight-bit-graphic:
        (re-search-forward (string-as-multibyte "[\240-\377]"))
It's not sophisticated.  :-(

> We MUST do something to make it easier for users to cope with such a
> situation.  We talked about this a few weeks ago but nothing was done.
> Perhaps we could add a command which simply scans forward for the next
> run of characters that can't be saved in the specified coding system.
> The message you get in that situation could tell you about this
> command.  This would be a powerful solution, since you could easily
> find all the problems, not just the first one.  Highlighting all of
> them would also be a useful thing to do.

Do you mean a command something like this?

(defun check-coding-system-region (from to coding-system &optional max-num)
  "Check if the text after point is encodable by the specified coding system.
When called from a program, takes three arguments:
CODING-SYSTEM, FROM, and TO.  START and END are buffer positions.
Value is a list of positions of characters that are not encodable by
CODING-SYSTEM.
Optional 4th argument MAX-NUM, if non-nil, limits the length of
returned list.  By default, there's no limit."
  (interactive (list (point)
                     (point-max)
                     (read-non-nil-coding-system "Coding-system: ")
                     1))
  (check-coding-system coding-system)
  (or (and coding-system
           (integerp (coding-system-type coding-system)))
      (error "Invalid coding system to check: %s" coding-system))
  (let ((safe-chars (coding-system-get coding-system 'safe-chars))
        (positions)
        (n 0))
    (save-excursion
      (save-restriction
        (narrow-to-region from to)
        (goto-char (point-min))
        (or max-num
            (setq max-num (- (point-max) (point-min))))
        (if (eq safe-chars t)
            (let ((re (string-as-multibyte "[\200-\237\240-\377]")))
              (while (and (< n max-num) (re-search-forward re nil t))
                (setq positions (cons (1- (point)) positions)
                      n (1+ n))))
          (while (and (< n max-num) (re-search-forward "[^\000-\177]" nil t))
            (or (aref safe-chars (preceding-char))
                (setq positions (cons (1- (point)) positions)
                      n (1+ n)))))))
    (if (interactive-p)
        (if (not positions)
            (message "All characters are encodable by %s" coding-system)
          (goto-char (car positions))
          (error "This character can't be encoded by %s" coding-system))
      (setq positions (nreverse positions)))))

---
Ken'ichi HANDA
address@hidden




reply via email to

[Prev in Thread] Current Thread [Next in Thread]