[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug reported regarding Unicode handling in email address

From: Tom Lane
Subject: Re: Bug reported regarding Unicode handling in email address
Date: Wed, 02 Jun 2021 01:16:53 -0400

Ken Hornstein <kenh@pobox.com> writes:
> So, it seems like the behavior of iscntrl() and isspace() if the value
> is > 127 is undefined.  If you're in the UTF-8 locale MacOS X treats that
> as a Unicode codepoint.  But we are NOT treating it like that in this case;
> we're processing it on a character-by-character basis.

The <ctype.h> macros are just fundamentally broken in any locale that
has multibyte characters: you cannot squeeze a multibyte character
into an input that is supposed to be either an "unsigned char" or EOF.
Vendors can choose either to violate the spec (say, by interpreting
the "int" input as a Unicode codepoint) or to produce useless results.

(As I recall, the MacOS UTF8 locales are badly broken in some other
ways, but this problem is not Apple's fault.)

> I am wondering if the simplest solution is to put in isascii() in front
> of those tests in that function.  We only really care about those tests
> returning "true" for ASCII characters.  Thoughts?

Yeah, that seems like a reasonable fix in this context.

                        regards, tom lane

reply via email to

[Prev in Thread] Current Thread [Next in Thread]