[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nmh-workers] nmh architecture discussion: format engine character s

From: Jon Steinhart
Subject: Re: [Nmh-workers] nmh architecture discussion: format engine character set
Date: Tue, 11 Aug 2015 09:08:14 -0700

I am in no way an expert on this.  But, I won't let that stop me.

It seems to me that the only solution is to use Unicode internally.
Disgusting as it seems to those of us who are old enough to hoard
bytes, we might want to consider using something other than UTF-8
for the internal representation.  Using UTF-16 wouldn't be horrible
but I recall that the Unicode folks made a botch of things so that
one really needs 24 bits now, which really means using 32 internally.

The reason why I think that Unicode is appropriate is that it has been
designed to be a superset of all other character sets.  Being that the
RFCs allow the mixing of character sets, Unicode allows them to be
represented without having to encode "bank switching".  I realize that
doing this requires a library that does all of the Unicode character
handling properly, which is not a trivial task.

On the output side, we just have to do the best we can if characters in
the input locale can't be represented in the output locale.  This is
independent of the internal representation.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]