Re: how to find encoding violations in Emacs buffer?

From: address@hidden
Subject: Re: how to find encoding violations in Emacs buffer?
Date: 13 Dec 2006 04:34:09 -0800
On Dec 13, 11:45 am, Peter Dyballa <Peter_Dyba... -at- Web.DE> wrote:
> Am 13.12.2006 um 09:39 schrieb -at-
> > Yes, but it may be hard to spot one single problematic character in a
> > large buffer.  In the case at hand, I had one Latin-1 "ù" in a 20k
> > UTF-8 text,This character is an UTF-8 entity:
> It cannot be the cause. In UTF-8 it's encoded as C3 B9.

Yes, but the file had 0xF9 in it instead of 0xC3B9, which caused UTF-8
auto-detection to fail.

> > Isn't there a way to implement a "goto-next-problematic-char" elisp
> > function?  UTF-8 has a rather simple algorithm to detect encoding
> > violations, which can point at the precise point where a byte sequence
> > violates UTF-8 rules, but I wondered if Emacs had a more general
> > interface: if it knows where in the buffer the encoding violations
> > are located, one would assume that this information would be available
> > at elisp level.
> There is something like this already implemented in PostScript
> printing: when the buffer contains characters outside a specific ISO
> Latin encoding up to a dozen of them is presented in a warning buffer.

Thank you for the pointer!  I'll have a look at that code.


