nmh-workers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nmh-workers] nmh architecture discussion: format engine character s


From: Ken Hornstein
Subject: Re: [Nmh-workers] nmh architecture discussion: format engine character set
Date: Wed, 12 Aug 2015 21:55:53 -0400

>Take the reply command.  The first thing it needs to do is read the
>original email data to generate the draft template for editing.  The
>initial read operation is filtered thru the Encoder first.  The result
>is passed into the nmh engine to parse header fields and other jazz to
>create the draft message (all of this is done in the UTF8 world).  When
>writing the draft, the data is piped thru the encoder then written to
>disk before launching the editor (hopefully it is a no-op, but if in a
>non-UTF8 locale...).

So I was about to say that we don't know what to do in that case, but
I took a look at RFC 6587.  It turns out that it spells out exactly
how to 'downgrade' a message to only ASCII.  This requires encoding
domains in Punycode, using RFC 2047 and RFC 2231 where appropriate, and
use RFC 2047 for addr-spec if the mailbox name contains UTF-8.

This does not strike me as terrible, and the code is mostly written
(well not to convert U-labels to A-labels, but pretty much every Unicode
string library we've looked at has a Punycode encoder-decoder).

So that suggests to me:

- Handle everything internally as UTF-8.
- For _display_, try to convert all of the characters to the native
  character set (yes, using the locale, dammit!).
- For things like _replies_, if we are not in an UTF-8 locale then downgrade
  things like the addresses using RFC 6587 rules (well, the subject as well ...
  I think the way it would work is the format engine would do the encoding
  for you behind the scenes for all components).
- Reconvert such messages to 'canonical' standard while sending.  Well, I
  think just for addresses; leaving everything else as an encoded word might
  not be harmful.  But I'd have to think about it.
- But this also makes it clear that the thoughts of having an 'external'
  decoder stage will simply not work; you need to know too much about each
  header, because they're all handled differently.

Thoughts?

--Ken



reply via email to

[Prev in Thread] Current Thread [Next in Thread]