[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nmh-workers] nmh architecture discussion: format engine character s

From: Earl Hood
Subject: Re: [Nmh-workers] nmh architecture discussion: format engine character set
Date: Wed, 12 Aug 2015 10:08:31 -0500

On Tue, Aug 11, 2015 at 11:30 PM, Ken Hornstein wrote:

> I confess that I am surprised the "UTF-8 or die" crowd has been so unaminous
> so far.  No one dissents from this view?  Like I said, it simplifies a WHOLE
> bunch of code (at the cost of adding a new library dependency), so I would
> actually be fine with it.

Since I will likely not be doing any of the actual coding, I have no
real skin in the game, however...

I think it is questionable design-wise to take the "UTF-8 or die"
approach, especially when there are operations that are unavoidable that
would facilitate a more general-purpose design.

It appears the basic processing model is a pipeline:

  Raw -> [Encoder] -> UTF8 -> [Processor] -> UTF8 -> [Encoder] -> Output

An encoder is needed to deal with whatever character encodings may be
present in the original, raw data.  This is unavoidable if nmh is going
to properly support the various mail standards and the bulk of mail that
still goes out today in non-UTF-8 encodings.

The [Encoder] will normalize all character data into UTF8.  Nmh,
[Processor], then does whatever it needs to do (like parsing addresses).
The immediate result of that is UTF8, which is then piped into the
[Encoder] to generate the final output based on locale settings.

The final [Encoder] may be a no-op if the output is to be in UTF8, but
if not (either due to environment locale setting or explicit
configuration setting), [Encoder] does it thing.

Since the need for an Encoder is unavoidable from the raw input reading
side, might as well reuse it on the output side, allowing nmh to be
friendly to any locale the end-user is using.

For maximum flexibility, the [Encoder] could be pluggable.  I.e.
Provide config option that allows one to register an external program to
do the encoding, where the data is provided via stdin and nmh gets the
results from stdout.  Such flexibility would allows folks to
evaluate/use other encoders and likely handle character data not
supported in Unicode (I know, a rare case, but is theorectically
possible--Klingon is still not officially part of Unicode ;).


reply via email to

[Prev in Thread] Current Thread [Next in Thread]