[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Nmh-workers] base64 ... just looking for advice
From: |
Ken Hornstein |
Subject: |
Re: [Nmh-workers] base64 ... just looking for advice |
Date: |
Wed, 27 Jan 2016 10:52:51 -0500 |
Man, nmh is quiet for a while, so I think I can go on vacation ... and
look what happens! Also, I get stuck at Disney World in a historic
snowstorm so I was delayed even longer.
I see you already got the answer to your question, and David Levine
already covered some of the finer points of mhfixmsg. But, let me point
out some larger meta-issues.
>So the thing I love absolutely most about MH/NMH is that my email is
>just files on my disk that I can grep through.
I can understand why you say that, and while it IS true that a) nmh stores
each message in a separate file, and b) it is possible to use grep to search
those files, the conclusion you're making, c) grep will return useful search
results, is NOT necessarily true.
The problem is that you're thinking that an 'email file' consists of
ASCII, or just plain text in a single character set. That is not true,
and hasn't really been true for a few decades. As Wolfgang Denk points
out, when talking about the output of mhfixmsg:
>Agreed - but this leaves us with a problem; as we now have a single
>file with different parts in different character sets.
The thing is, _that's a completely valid email_. It's not a problem!
It's only a problem if you persist in thinking that 'email' consists
of plain text in a single character set. I know, it was that way for
a while and probably the most of the email you get is still like that.
But here we have 30-year-old expectations colliding with modern reality.
>However, an awful lot of my email is coming to me base-64-encoded,
>for no particularly obvious reason... probably an accented character
>or (in one case) line-drawing characters.
When I ran into this, the answer was that basically some MUAs always
base64-encoded everything (the one that ran on Blackberry was one
example I encountered). Unfortunately, pretty much every MUA can handle
this just fine, so we have to deal with it.
So mhfixmsg is kind of a Band-Aid on a larger problem. It's not that I
have objections to Band-Aids (see replyfilter), but it's important to
understand the limitations here.
Here are the basic constraints:
1) We assume 'email files' are RFC 5322-format messages (well, okay, with
the exceptiomn that they've been converted to Unix line ending format.
A minor issue that we can mostly ignore).
2) RFC 5322-format messages contain bytes that can be encoded different
ways, and can be in different character sets (or not even be text
at all). So in the larger case they're not valid to be processed with
regular Unix text processing tools (because RFC 5322 != text).
3) We can convert the RFC 5322 messages to something that's more friendly
to use with regular text processing tools (that's what mhfixmsg does).
But we can't convert them to 'text' completely, because some parts of
RFC 5322 cannot be represented in an unencoded form (well, I suppose
we could turn them into message/global messages, but that assumes
that you want everything in UTF-8 and we don't actually handle those
messages very well).
So the short answer is yes, use mhfixmsg, but I think the only long-term
solution is to make nmh tools smarter.
--Ken