[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nmh-workers] General question - unsupported charset conversion

From: Aleksander Matuszak
Subject: Re: [Nmh-workers] General question - unsupported charset conversion
Date: Sat, 01 Mar 2014 00:53:27 +0100

Ken Hornstein writes:

> >Unfortunately, I have a lot of experience and troubles with character
> >set conversion. 
> Well, if you just bit the bullet and switched to UTF-8, you wouldn't have
> all of these problems! :-)

It is not that simple. Utf-8 solves couple of problems but creates some
new .... =:-) Advantages and disadvantages of utf-8 is a very wide

> >In practice it means a spam in exotic language and at this point I know
> >that I do not want to read such a message. 
> I can see that, but I'm not sure that's an appropriate choice for all
> cases (like, for instance, MIME parameters).

That is right. On the other hand, you never prevent malformed MIME

> >This is very frequent and causes a lot of troubles. Entire message in
> >English and one foreign family name in original. Message send in utf-8
> >but (suppose) my terminal support only ASCII. Converison would fail. 
> Errr ... really?  In the case I'm thinking, the one foreign family
> name would have the offending character output as a '?' (or whatever).
> The conversion would go through fine.

Well, the meaning of word "fail". Formally it is not possible to
convert any utf-8 character to 256 characters in iso/cp/... 8bit set. 
Converison would fail.

Ignoring absent symbols or substituting them by something else causes
that the conversion would go through fine.

Ignoring symbols or substituting them by '?' causes that conversion is
non-reversible and the result may be difficult to read. 

It is not a problem in case of one or two missing or substituted
symbols in long text. We can guess what is the me?ning of the word.
For many non-convertible symbols reading of such a text is more
similar to solving a crossword puzzle. What could be '??o??w??d'
> >In my personal opinion a very good choice is conversion into
> >html-entities, like ą or ł . It remains quite readable and
> >is still unique enough to convert it back in case of need.
> Um, ouch.  Unless there's a common library that already implements
> that behavior, that's not on the table at all.

This is a serious argument. However, mentioned Recode library has
something like that: 

I do not know is it useful or not.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]