nmh-workers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility


From: Lyndon Nerenberg
Subject: Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
Date: Mon, 17 Oct 2016 19:23:29 -0700

> On Oct 17, 2016, at 6:39 PM, Ken Hornstein <address@hidden> wrote:
> 
> What it refuses to do now is create improperly-formatted email messages
> when it cannot identify the character set.  Before it would happily
> send these messages out; THAT has been broken for twenty years and was
> only recently fixed.
> 
> And if we're voting ... I would rather have only one additional way to
> specify a nmh-specific locale (well, I'd rather have ZERO additional
> ways, but I think more than one way is overkill).
> 
> (And it occurs to me that even setting the locale properly probably
> will not fix your specific problem, as you have described it; forwarding
> messages using MIME will).

The underlying problem is that locales were built before anyone really 
understood the problem.  For one, they assume symmetry on input and output; 
there is no LC_CTYPE_INPUT and LC_CTYPE_OUTPUT.

This is why Plan9 punted on the entire issue and said UTF-8 everywhere.  Do 
what you want outside, but it's your job to convert to UTF-8 before you talk to 
or from the tools.  And they provided a command line tool to do just that.  If 
you look at the Plan9 mail system, it's all UTF-8 internally.  When mail comes 
in over the wire, the appropriate MIME charset= parameters are used to convert 
content to UTF-8 for display (upas/fs takes care of this).  By definition, all 
input is UTF-8.

If we were to use $LANG/$LC_CTYPE to convert incoming data to UTF-8 in the same 
manner, and process (and store!) everything internally as UTF-8, all of this 
nonsense would go away.  Similarly, we could convert from UTF-8 -> 
$LANG/$LC_CTYPE on the way out.  And we could ship everything off-site with one 
of only two character sets: ascii, or utf8.

Good grief, even Microsoft has figured this out :-P  Yes, someone has to write 
the code.  Let's ship 1.7 (if Ralph ever stops committing!), then do 1.8 (the 
SSL/TLS stuff).  And then let's branch for 2.0 and go for a top-to-bottom UTF-8 
runtime.  I've been pharting around with this for a couple of years now in my 
own private branch.  It's not trivial, but it's doable.  And maybe *mh should 
lead the way again, for the first time in a few decades.

--lyndon




reply via email to

[Prev in Thread] Current Thread [Next in Thread]