[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Nmh-workers] Understanding nmh (aka. What's the goal) [really non-A
Re: [Nmh-workers] Understanding nmh (aka. What's the goal) [really non-ASCII message bodies ]
Wed, 08 Dec 2010 11:49:08 +0700
In this discussion people (other than perhaps Jon, though he hasn't
said this explicitly) have just been assuming that if the e-mail body
of a message contains data that is not ascii, then it must be some other
character set, because after all, all e-mail is text ...
In the days of MIME, that's simply not true, and while it is unlikely
that anyone is going to use prompter, or even some other editor, to
produce a jpeg file, there's nothing to prevent a script producing a
file with a jpeg body, and 822 headers, and handing that to nmh to process.
We might prefer such a script to generate all the right headers, but MH
really doesn't like it if we attempt to tell it mime info in the
components file (or the draft) - it insists on adding that itself, so
not doing all the content type processing before calling nmh processes
Now there is nothing at all illegal about this - even ignoring that
"illegal" is the wrong word to use in any case, "non-conforming" would
be the correct term. The standards don't apply to what the user feeds
nmh, that's locally defined "anything goes" territory. What matters is
what nmh hands to the MTA (and even more, what the local MTA passes on
to its peer). There if we simply send an 822 (old style, non-MIME) message
with arbitrary binary content we have a non-conforming message.
That's what (as I understand it) Jon's patch handles (I still use the
latest released version of nmh, which predates all this stuff ... there
hasn't been a new release (since 1.3) for a long time now...) and makes a
standards conforming message. It obviously has no way of knowing what the
data that it detects as non-ascii is (or not without extra information from
the sender), so "application/octet-stream" sounds to me as if it is the
perfect choice (along with either QP or B64 encoding to handle the body
format) to indicate "here is stuff, but I have no idea what it means, work
it out for yourself" - which for many users of this kind of procedure
would probably be adequate.
I'm certainly not arguing that we should keep this behaviour, and certainly
not as the default - I expect that real users of binary message bodies that
are not text are so rare that, even if there are any at all, updating them
would not be a huge problem (provided the change notes for the next nmh
release make it clear this has happened).
However, I don't think we should give up the ability to simply send an
e-mail where the body is image/jpeg or whatever - there's no requirement that
there be any text in the body of the message at all, even though most MUA's
simply assume that, and require a multipart to include anything that is not
text. MH should be better than that, being just as good as "most MUA's" is
a fairly grevious insult IMO. And while retaining the # language of mhbuild,
or something equivalent, is essential to enable truly general messages to
be created, expecting to use that for trivial tasks is, I agree, asking too
much - and requiring explicit mime processing at the whatnow stage should
only be necessary when the full mhbuild procedure is to be invoked.
(Do recall that wnen this was added, MIME messages were rare, and lots of
users didn't like them - most MUA's had no way to display them, not even
as "good" as nmh does now - and so wanting that processing was very
unusual. These days, almost every message should comply with at least
basic mime formats.
My suggestion to handle general bodies is to allow a switch that sets the
MIME content-type of the message (defaulting to text/plain) - and then base
all the other decisions off that. If (as a result of the default, or by
being explicitly set) we get a text/* content-type, then we can attempt to
work out the charset involved, and add the proper indicator. On the
other hand, if someone really wants to send an application/octet-stream,
then let's allow them to do that, or if they want to send image/jpeg or
audio/whatever they should be able to do that too (a message that is
entirely audio/* could even be handled my "show" by playing it through the
local system's speakers, assuming that's possible - implementing voice-email)
I also don't believe that this processing should be keyed off some -attach
switch - as a way to simplify adding an attachment to a message (incidentally,
if given twice, can we have two attachments, or is there some other way to
do that?) it sounds OK, but for charset processing?
For text messages, the right thing should be done regardless of whether there's
any plan or intent to add attachments, and using a switch "-attach" in the
profile to mean "encode my text correctly" is bizarre...
I'm all for backwards compatability, but only backwards compatible for
correct behaviour, keeping all the existing bugs should not be required
(though I think there are environments where even that is expected.)
Even for attachments, as I understand it, that's keyed off a pseudo-header
added to the components file (and so appears in the draft), right? Do we
really need a switch to enable that. I'm (again) all for backwards
compatability, but is there any serious believe that people are really
adding "Attachment:" (whatever it is, "MH-Attach:" might be better) headers
to their messages and expecting that to be delivered? And yes, I know
non-standard headers are OK, but we have non-switch-enabled locally invented
headers used for this kind of purpose already (like fcc) - another,
expecially if given a MH specific name, should be harmless. It would be
simpler to just do the processing and not require a switch (switches that
we more or less tell people that "everyone should have this in their profile"
are just dumb...)
Now for a couple of requests for extra stuff I'd like to see if anyone
has the will power to make it happen (I doubt I'm going to get around to it
any time this millenium...)
First, I end up using any one of 4 charsets, and while basing the choice of
defailt input charset on MM_CHARSET is fine, I'd really like a way I can
tell nmh what charset I'm actually using.
The 4 I routinely use are us-ascii (of course - including this message),
iso-8859-1 (which has us-ascii as a subset, and so I don't really need
us-ascii, I could just use iso-8859-1 instead), tis-620 (which also has
us-ascii as a subset...) and utf-8 (which also has us-ascii as a subset).
When I send a message I'd appreciate a switch that I could use to tell
nmh what encoding I happen to have used this time - as fiddling the environment
is a messy way to make one off changes (it also needs to be set before the
A suggestion for this to avoid adding zillions of new switches - make the
charset a third "sub-type" of the "content-type" (or just "type") switch
suggested above, so I could say "comp -type text/plain/tis-620" or
"comp -type //utf-8" (relying upin text/plan being the default type).
This switch should get passed along to send I presume, so it also needs
to be available at the whatnow prompt - that's where it would be used
more frequently as in ...
What now? send -type text//iso-8859-1
as it might be only after I finish typing (preparing the message) that I know
that I have used something different than my default charset - the way I
get that most frequently is cut/paste from someplace else...
And last, and definitely harder - all that is just the charset of the
component file (draft file) I create - that isn't necessarily what I
want to send, in fact, aside from pure us-ascii messages (and a header
field that is "Content-Type: text/plain; charset=us-ascii" should probably
just be omitted...) I'd actually prefer everything I send (that is:
everything that my MTA sends elsewhere) be utf-8, regardless of whether
I happen to type (or paste) 8859-1 or tis-620 (one of those would be what
my keyboard generates, getting utf-8 requires a conversion process - I'm
requesting that nmh do that conversion upon request (using iconv of course),
so that the outgoing charset can be different from the input charset.
A secondary advantage of this is that it would allow verification that the
claimed input charset actually makes sense given the data being processed,
if the iconv failed, the user would be asked to correct things and try
again, so if I set tis-620 and have data that is neither ascii nor Thai
characters, then rather than just mislabelling it (which most systems do,
I receive this junk all the time) it would be nice to get it fixed (by the
user - either by correcting the data, or altering the encoding, or content-
Re: [Nmh-workers] Understanding nmh (aka. What's the goal) [ reallynon-ASCII message bodies ], Oliver Kiddle, 2010/12/08