Re: [Nmh-workers] General question - unsupported charset conversion

From: Aleksander Matuszak
Subject: Re: [Nmh-workers] General question - unsupported charset conversion
Date: Fri, 28 Feb 2014 19:37:45 +0100

Ken Hornstein writes:

> I've been grappling with to do when we have issues with character set
> conversion.  

Unfortunately, I have a lot of experience and troubles with character
set conversion. 

> Specifically, I have two issues:
> - What to do if the character set is unsupported.

> Should we return the original bytes?  

It is not the best idea. Some sequences of bytes are control sequences
for terminal. This sometimes set terminal in unusable state.

> An error? [..]  Some string which says, "We cannot convert
> klingon-8842 to us-ascii" or the equivalent?

In practice it means a spam in exotic language and at this point I know
that I do not want to read such a message. 

In rare cases when I want to read in charset unsupported by 
configuration this is advantage of mh system that it is possible to
handle it separately. Save, decode, convert .. whatever.

> - What to do when we cannot convert a particular character.  This is a
> little more clear; the general trend is to use a substitution
> character.

This is very frequent and causes a lot of troubles. Entire message in
English and one foreign family name in original. Message send in utf-8
but (suppose) my terminal support only ASCII. Converison would fail. 

I can prepare an example but including it into this message can make it
difficult to read.

In my personal opinion a very good choice is conversion into
html-entities, like ą or ł . It remains quite readable and
is still unique enough to convert it back in case of need.


