help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: how to find encoding violations in Emacs buffer?


From: riccardo . murri
Subject: Re: how to find encoding violations in Emacs buffer?
Date: 13 Dec 2006 00:39:57 -0800
User-agent: G2/1.0

On Dec 13, 5:26 am, Eli Zaretskii <e... -at- gnu.org> wrote:
> > From: riccardo.mu... -at- gmail.com
> > Date: 12 Dec 2006 10:18:13 -0800
>
> > from time to time, a buffer gets some spurious character in and Emacs
> > refuses to save it in the correct encoding. So I am presented with the
> > choice of other different encodings.
>
> > However, in most of the cases, I know that the file *should* be UTF-8
> > encoded.  So I would rather like to find out where the offending
> > character is and correct it, instead of choosing a different encoding.
>
> > Is there any function/package/elisp hack to find/highlight characters
> > in a buffer that Emacs could not encode as UTF-8?
>
> Emacs 22 already shows the problematic characters.  Please look closer
> at the text of the buffer where Emacs tells you why it needs your
> decision about the encoding.

Yes, but it may be hard to spot one single problematic character in a
large buffer.  In the case at hand, I had one Latin-1 "รน" in a 20k
UTF-8 text, and, since the encoding was thus incorrect and could not
be autodetected, Emacs displayed al non-ASCII characters as \xxx
escape sequences...

Isn't there a way to implement a "goto-next-problematic-char" elisp
function?  UTF-8 has a rather simple algorithm to detect encoding
violations, which can point at the precise point where a byte sequence
violates UTF-8 rules, but I wondered if Emacs had a more general
interface: if it knows where in the buffer the encoding violations
are located, one would assume that this information would be available
at elisp level.

Riccardo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]