[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nmh-workers] A € for your thoughts - should we fix UTF-8 subject ou

From: Tom Lane
Subject: Re: [Nmh-workers] A € for your thoughts - should we fix UTF-8 subject output in scan for 1.5?
Date: Mon, 21 May 2012 11:06:52 -0400

Ken Hornstein <address@hidden> writes:
> My question back to you: do the is* functions take bytes, or
> characters?  If they take bytes, then I agree with you.  If they
> take characters ... well, I'm not sure what is right.

Quoting POSIX:2008, for isalnum and friends:

        The c argument is an int, the value of which the application
        shall ensure is a character representable as an unsigned char or
        equal to the value of the macro EOF. If the argument has any
        other value, the behavior is undefined.

So these functions only work portably in single-byte encodings.
Particular implementations might choose to make them do something useful
for input values above 255, but you couldn't expect that to work
everywhere.  To work portably in UTF8 and other multi-byte encodings,
you have to go over to the wide-character functions in <wctype.h>.

                        regards, tom lane

reply via email to

[Prev in Thread] Current Thread [Next in Thread]