chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] BOM in a Scheme source file


From: John Cowan
Subject: Re: [Chicken-users] BOM in a Scheme source file
Date: Sun, 9 Sep 2007 21:06:32 -0400
User-agent: Mutt/1.5.13 (2006-08-11)

Pierpaolo Bernardi scripsit:

> See here for example: http://unicode.org/faq/utf_bom.html#29
> 
> which says that you can put a bom in a utf8 file (of course, you can
> put whatever character you want in a file), but it is a character
> like every other character, it has no particular meaning wrt the encoding.

BOMs serve two purposes: in UTF-16 and UTF-32 it specifies the actual
byte order, but in all UTFs it helps to provide a signature specifing
the encoding.  As such, when a UTF-8 file begins with U+FEFF, the decoder
MAY use it to assume UTF-8 input, and SHOULD then ignore it.  So you
are right to say that a BOM in a UTF-8 file does not affect the format
of the encoding, but it can and does affect the overall decoding process.

For backward compatibility, it is possible for encoders to begin a file
with a non-BOM U+FEFF serving as a ZERO-WIDTH NON-BREAKING SPACE, but
in that case either it should be output twice (once as a BOM, once as a
character), or else U+2060 WORD JOINER, which has the same semantics but is
not a BOM, should be used instead.

> Then, maybe chicken could consider U+FFFE as whitespace,

Not as whitespace, but as nothing at all.

> to work around this bug in scite, and maybe other broken tools.

Not a bug, not broken.

-- 
He made the Legislature meet at one-horse       John Cowan
tank-towns out in the alfalfa belt, so that     address@hidden
hardly nobody could get there and most of       http://www.ccil.org/~cowan
the leaders would stay home and let him go      --H.L. Mencken's
to work and do things as he pleased.              Declaration of Independence




reply via email to

[Prev in Thread] Current Thread [Next in Thread]