[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug reported regarding Unicode handling in email address

From: Ken Hornstein
Subject: Re: Bug reported regarding Unicode handling in email address
Date: Wed, 02 Jun 2021 17:47:42 -0400

>It's early morning for me, and I'm still at least a liter of Diet Mountain Dew
>away from being sufficiently caffeinated to be positive, but that looks like
>"not totally correct, but a lot closer than what we have now".
>In particular, that will accept overlong and illegal utf-8 codepoints, and
>probably misbehaves in strange and unusual non-ascii/non-utf-8 things
>like iso2022-jp.

So, the DETAILS are complicated.

The address parser code is used for a lot of things.  The specific bug
report was about a draft message that contained Cyrillic characters.
We know what that character set was in THAT case, because it's a draft
message and we can derive the locale from the environment or the nmh
locale setting.  But if we are processing an email message then we don't
easily know the character set.  In theory it should either be us-ascii
or utf-8, but reality sometimes intrudes and it could be anything.

I think really instead of using ctype macros, we should be using a
specific set of macros tailored for email addresses.  Or a flex
lexer designed to process those things.  I kind of think that we
should simply pass the input along as we are given rather than trying
to validate that it is valid UTF-8 (for example).  iso2022-jp is
SO complicated, I don't think we should even try and I get the sense
everyone is migrating to UTF-8 for email anyway.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]