[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nmh-workers] General question - unsupported charset conversion

From: Ken Hornstein
Subject: Re: [Nmh-workers] General question - unsupported charset conversion
Date: Fri, 28 Feb 2014 13:49:19 -0500

>Unfortunately, I have a lot of experience and troubles with character
>set conversion. 

Well, if you just bit the bullet and switched to UTF-8, you wouldn't have
all of these problems! :-)

>> Should we return the original bytes?  
>It is not the best idea. Some sequences of bytes are control sequences
>for terminal. This sometimes set terminal in unusable state.

Seems fine to me.

>> An error? [..]  Some string which says, "We cannot convert
>> klingon-8842 to us-ascii" or the equivalent?
>In practice it means a spam in exotic language and at this point I know
>that I do not want to read such a message. 

I can see that, but I'm not sure that's an appropriate choice for all
cases (like, for instance, MIME parameters).

>> - What to do when we cannot convert a particular character.  This is a
>> little more clear; the general trend is to use a substitution
>> character.
>This is very frequent and causes a lot of troubles. Entire message in
>English and one foreign family name in original. Message send in utf-8
>but (suppose) my terminal support only ASCII. Converison would fail. 

Errr ... really?  In the case I'm thinking, the one foreign family
name would have the offending character output as a '?' (or whatever).
The conversion would go through fine.

>In my personal opinion a very good choice is conversion into
>html-entities, like ą or ł . It remains quite readable and
>is still unique enough to convert it back in case of need.

Um, ouch.  Unless there's a common library that already implements
that behavior, that's not on the table at all.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]