[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Bug reported regarding Unicode handling in email address
From: |
Tom Lane |
Subject: |
Re: Bug reported regarding Unicode handling in email address |
Date: |
Wed, 02 Jun 2021 09:23:32 -0400 |
Ken Hornstein <kenh@pobox.com> writes:
>> The <ctype.h> macros are just fundamentally broken in any locale that
>> has multibyte characters: you cannot squeeze a multibyte character
>> into an input that is supposed to be either an "unsigned char" or EOF.
>> Vendors can choose either to violate the spec (say, by interpreting
>> the "int" input as a Unicode codepoint) or to produce useless results.
> It's worth pointing out that the official prototype for the ctype macros
> all say they take "int" as an argument, and POSIX says they take as
> an argument a "character". So interpreting that argument as a Unicode
> codepoint (assuming you're currently in a Unicode locale) is, from my
> reading, within the spec.
You need to read a bit further down, where POSIX says
The c argument is an int, the value of which the application shall
ensure is representable as an unsigned char or equal to the value of
the macro EOF. If the argument has any other value, the behavior is
undefined.
(C99 has identical verbiage.)
The reason to declare the argument as int is so that these can take EOF,
which I suppose is meant to allow them to be applied directly to the
result of getc() ... though why anyone would write code that way is
not clear to me. Anyway, interpreting the input as a Unicode code point,
for values above U+7F (or, if you stretch it unreasonably, U+FF) is
very clearly outside the spec.
regards, tom lane
Re: Bug reported regarding Unicode handling in email address, Valdis Klētnieks, 2021/06/02
- Re: Bug reported regarding Unicode handling in email address, Ken Hornstein, 2021/06/02
- Re: Bug reported regarding Unicode handling in email address, Bob Carragher, 2021/06/03
- Re: Bug reported regarding Unicode handling in email address, Ralph Corderoy, 2021/06/07
- Re: Bug reported regarding Unicode handling in email address, Ken Hornstein, 2021/06/07
- Re: Bug reported regarding Unicode handling in email address, Ralph Corderoy, 2021/06/10
- Re: Bug reported regarding Unicode handling in email address, Ken Hornstein, 2021/06/10