Re: Bug reported regarding Unicode handling in email address

From: Ralph Corderoy
Subject: Re: Bug reported regarding Unicode handling in email address
Date: Thu, 10 Jun 2021 11:31:10 +0100

Hi Ken,

> > > The address parser code is used for a lot of things.  The specific
> > > bug report was about a draft message that contained Cyrillic
> > > characters.  We know what that character set was in THAT case,
> > > because it's a draft message and we can derive the locale from the
> > > environment or the nmh locale setting.  But if we are processing
> > > an email message then we don't easily know the character set.  In
> > > theory it should either be us-ascii or utf-8, but reality
> > > sometimes intrudes and it could be anything.
> > 
> > If it's an email then won't it be ASCII?
> Boy, you're out of the loop!  Check out RFC 6532.

Oh, SMTPUTF8, yes I've seen that around.  :-)

But my point stands.  nmh should know from the context where the email
address appears what encoding the bytes use when trying to parse it.

- mail/inbox/42 was written by us; it's our choice.
- mail/draft is the process's locale.
- /var/spool/$LOGNAME is in UTF-8.

Cheers, Ralph.

