[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nmh-workers] A € for your thoughts - should we fix UTF-8 subject ou

From: Tom Lane
Subject: Re: [Nmh-workers] A € for your thoughts - should we fix UTF-8 subject output in scan for 1.5?
Date: Mon, 21 May 2012 14:54:50 -0400

Ken Hornstein <address@hidden> writes:
>> So these functions only work portably in single-byte encodings.
>> Particular implementations might choose to make them do something useful
>> for input values above 255, but you couldn't expect that to work
>> everywhere.  To work portably in UTF8 and other multi-byte encodings,
>> you have to go over to the wide-character functions in <wctype.h>.

> Yeah, but the issue isn't about values about 255, it's about values above
> 127.  Your locale is UTF-8, and you call isspace(0xa0).  Does that mean
> "the character 0xa0", which is U+00A0 (a space)?  Or does it mean
> one byte of a multibyte character, in which case ... who knows?

Well, I would say that the standard's authors wrote "character" with
malice aforethought, and that what they meant was that the value had to
represent a character, not one byte of a multibyte character.  So if
isspace(0xa0) means anything in UTF8 encoding, it would have to refer
to the Unicode code point U+00A0.  However, in practice I'm not sure
what good it does you to worry about whether or not that works, because
if you want to support anything beyond LATIN1 you need to be using
iswspace() anyway.

                        regards, tom lane

reply via email to

[Prev in Thread] Current Thread [Next in Thread]