[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Nmh-workers] improving nmh's MIME support

From: pmaydell
Subject: [Nmh-workers] improving nmh's MIME support
Date: Mon, 02 Jun 2008 10:34:14 +0100

[This is a fairly long email which I've been sitting on until 1.3
was out of the door...]

I've felt for some time that nmh could do better at handling
MIME messages. Particular problems I've noticed are:

(1) scan decodes the headers but the body text is not de-multiparted
    or charset-converted
(2) scan's -width switch doesn't cope with wide or multibyte characters
(3) you can't tell mhshow "just convert from the email's character
    set into the one my terminal uses", which is usually what you want
    now we have UTF-8
(4) if you do just want to convert character sets as above, you
    could just do it in plain old show without bothering with mhshow
(5) there's no easy way to compose mail in another character set
(6) replying to an email doesn't do anything with charset or multipart
    mime or quoted-printable; it just gives you the raw text

I think these could be fixed; here are my proposals; I've ordered
them with the easy standalone stuff at the front and the harder
stuff later.

NB: in the proposals below there are occasionally choices about
whether to update the default non-MIME aware behaviour of some bits of
nmh to handle MIME, and provide a non-MIME (raw data) version for the
unusual case where you really want it. The other approach would be to
leave the existing behaviour as it is and provide new switches or
escapes to say "do the sensible thing".  I tend to prefer the former,
as I think it provides the new functionality without requiring every
user to update all their customised format files and so on. Whichever
we choose, we should probably be consistent.  [I note that the current
MIME support has tended to take the latter path; for instance you need
to say "%(decode{subject})" in format files, and %{subject} still
gives you the raw text. So perhaps we should keep going with that
approach. Or we could change the defaults in the existing bits...]
Part G below has some further musings on this subject.

A: support MIME in scan

Most of this is already present -- headers are decoded and the
decode() calls are in the default scan format file. The missing part
is decoding the body text. I think that the (scan-only) body component
escape should do the following decoding:
 * handle multipart MIME in the same way as mhshow (I hope we can
   reuse some of the code because there's quite a bit of it :-))
 * quoted printable and base64 are decoded
 * characters are converted to the character set of the current locale

(We can do this with a new %{decodedbody} component, or make %{body}
default to it and have %{rawbody}.)

B: make scan's width switch support multibyte and wide characters

At the moment I think this is just counting bytes. It needs to count
columns instead. (The current implementation will sometimes output
half a multibyte character, which can mess up your terminal a bit).
This shouldn't be too difficult, although I haven't looked at the
implementation.  The putstrf() and field-width specifiers seem to be
OK, although there are some slight oddities with wide characters.

C: charset handling in mhshow

mhshow should be able to handle non-native character sets using
iconv() rather than insisting on invoking a command to deal with them.
I think it should do this by default if it cannot find an
mhshow-charset line in your .mh_profile.

D: more seamless handling of basically plain text MIME mail in show

At the moment show bails out and hands over to mhshow for quite a lot
of MIME messages. If mhl format files supported decoding of the body
(same three things as in part A above) then we could tweak the mhl
format file used by show so it would be able to show most MIME messages
with a plain text part.

We'd need an option to mhl's "body" component to tell it to
MIME-decode.  Either we can have 'body:decode' or if we default to
decoding we can have 'body:raw'. We'd also need to tweak show to be
less aggressive about what it fails over to mhshow for. We might need
some way to say 'pass even these over to mhshow'.

[I'm not entirely sure about what would be best here -- perhaps it
would just be better if mhshow could be made to act more like show, ie
have all messages formatted and fed to a single pager invocation.  The
distinction between 'show' and 'mhshow' seems a bit ugly to me. But
we'll want the mhl changes anyway, see below.]

E: support character sets when composing messages

This is probably the trickiest bit; certainly it's the bit that has
the most "ui" to it.

What we want, roughly, is to be able to write an email in our editor
using the terminal's native character set, without having to worry
about whether we've used characters which aren't in plain old
ASCII. Then when we send the message we want nmh to encode the headers
in RFC2047 format if necessary, and to quoted-printable or base64
encode the body if necessary, perhaps translate to a different
character set and add relevant MIME headers.

The trick here is picking a sensible character set for sending with,
since the answer often depends on what language you're sending in.
(For instance, for Japanese although UTF-8 will work, it's better to
send in iso-2022-jp for maximum compatibility.) My suggestion here is
one borrowed from mutt: a profile entry defining a list of character
sets to try; we take the first one which allows us to encode all the
characters in the email; so for instance you could set it to
"us-ascii;iso-2022-jp;utf-8". "us-ascii;iso-8859-1;utf-8" is the
default mutt uses.

I guess this ought to go in mhbuild, since that is already doing some
similar things (and in any case character set encoding needs to not
step on mhbuild's toes if there are multiple parts to the message).
mhbuild already does some character-set related things, some of which
I think need to change (like using MM_CHARSET rather than whatever the
current locale's charset is); I haven't thought about the details yet,

Ideally if an encoded header from an incoming message is copied across
to a reply being composed we'd like to use the same character set;
what we definitely would rather not do is convert it to a pile of
????s. I don't have a good idea for how to achieve this, though.

F: support MIME in replying

F1: when producing the draft file to be edited

The replcomps file will need to be updated to include suitable calls
to decode(). Then the headers will appear in the editor correctly.

repl uses an mhl format file to process the body text. So if we've
changed mhl as described in D above we can use that to include a
suitably decoded and charset-converted body.

This will allow 'repl' on a MIME message to drop you into an editor
with sensibly decoded text ready to edit.

F2: on sending

The above require that when we subsequently send a mail we correctly
re-encode all the headers and the body if necessary. This should be
dealt with by the code in E above.

G: smoothing off rough edges

There are probably some places where if you don't get the config right
unfortunate things happen. For instance, automimeproc isn't the
default, but the combination of E and F means that if you reply to an
email by somebody whose From: header had a 2047-encoded non-ASCII name
then unless mhbuild is run the resulting email will be broken. That's
unfortunate; perhaps a check before send that the headers at least
aren't broken might help.  This is basically a case where a "no MIME"
config is fine and an "all MIME" config is too but a mismatch causes
problems; unfortunately a number of bits of config that need to be
changed and some of them are in files that may have been user
customised. Perhaps there is some mileage in having a global setting
which changes the default behaviour of various things; so if the
global mime flag is off then %body gets you the raw body, but if
it is on then you get a decoded body; %rawbody and %decodedbody
would give you definitely one or the other.

H: documentation

There should probably also be some documentation about how to get a
sensible MIME setup, since some of the necessary configuration isn't
the default and in any case existing users may need to edit format
files and so on.


Opinions? I think A, B, C and D are fairly straightforward and
standalone issues, which can be done first. E is more complicated, and
F really depends on it. I suppose you could do the bits of F which
handle multipart and quoted-printable/base64 and leave the charset
stuff for later. If anybody has some ideas for avoiding possible
issues as per section G that would be particularly interesting.

-- PMM

reply via email to

[Prev in Thread] Current Thread [Next in Thread]