[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nmh-workers] hardcoded Charset removal

From: Oliver Kiddle
Subject: Re: [Nmh-workers] hardcoded Charset removal
Date: Wed, 19 Sep 2012 07:42:10 +0100 (BST)

David wrote:
> Thank you for analyzing this and providing a patch.  I
> have one question.  The patch removes the setting of
> t->tx_charset.  Should we retain that?  It's used in
> mhbuild, I believe.

It seems this was broken in the first place in 636b3bab. That change
merged similar code from mhshow and mhbuild. I've prepared a slightly
adjusted patch that clears up a few other things like the tx_charset.

What should mhbuild do if a draft contains an explicit
charset="us-ascii" definition but contains 8-bit characters? Currently
we end up with two definitions for charset. Should it just ignore the
charset or should it perhaps generate an error?

It seems one effect of 636b3bab is that the code for checking the
profile for what to do with the character set is run even for mhbuild.
Does this really matter?

I'd also be tempted to hard code mhshow in the name of the profile
entry. This would mean that mhlist -debug would actually show the
termproc: it doesn't because it's looking for, e.g.

Yozo TODA wrote:
> o I don't want to invoke another terminal window; I prefer to convert the 
> charset encoding
>    (e.g., apply "iconv -f iso-2022-jp-2 -t utf-8" to contents)

Unfortunately, it was designed for use with an xterm so doing that ends
up requiring a script if you also do things like handle HTML. Some
programs like w3m let you specify the input and output character set.
It is for this reason that I use w3m even though html2text was better in
other respects. In other cases, you have to decide whether to do the
iconv before or after the other command.

It'd probably be better if we provided the charset as % escapes for
mhshow-show. And as we're linking against iconv anyway, perhaps we
should simply make that possible directly. That might be slightly less
flexible, for example, I treat iso-8859-1 as windows-1252: it is a
subset and some incoming e-mails are incorrectly tagged.

I also wonder if it could be made easier to select alternative programs:
e.g. for PDFs, I sometimes want pdftotext and sometimes evince.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]