emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 23.0.60; [nxml] BOM and utf-8


From: Stephen J. Turnbull
Subject: Re: 23.0.60; [nxml] BOM and utf-8
Date: Mon, 19 May 2008 12:05:51 +0900

David Kastrup writes:

 > It would be sufficient to use an encoding variation which adds the bom
 > back on writing.
 > 
 > I am actually surprised that this is not done right now: I thought we
 > had a discussion about having the BOM-encodings early in the automatic
 > encoding detections.

IIRC this is an issue recently reported by Eli, discussed, and ISTR
already fixed by Handa-san.  Don't have time to dig it up though.
Something about -le vs -littleendian.

 > > Alternatively, sabotage the Microsoft users by silently eating the BOM
 > > on the way in, and writing the file in GNU substandard[1] format on
 > > the way out.
 > 
 > Emacs developers are not nonchalant about having Emacs write a byte
 > sequence differing from what it read in

OK, I should always use smileys on this list, my bad.  The main point
was to get in the "substandard" joke, YHBT HAND.  And I was
recommending that for Miles's benefit, not as an Emacs default.

In any case, maintaining faithfulness of representation is simply not
possible, as you point out (safe-character-sets or whatever you call
your analog to latin-unity being another case).  It's also not at all
obvious that that is a very useful requirement when dealing with a
character-oriented standard like Unicode or XML, since you can expect
many applications to canonicalize the text "behind your back".

Users should get used to it, and we should document how to force Emacs
to error rather than do anything behind your back for those who need
binary faithfulness rather than text faithfulness.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]