nmh-workers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nmh-workers] A non-complaint


From: Ken Hornstein
Subject: Re: [Nmh-workers] A non-complaint
Date: Wed, 06 Aug 2014 10:56:21 -0400

>1. The file contains a text version and non-text version of the
>content. But mhshow ignores the text version and uses the non-text
>version. I don't understand mhshow's man page well enough to tell
>mhshow to use the text version (and not invoke chrome), under these
>circumstances. If there is no way to do this, shouldn't here be a
>way? Indeed, shouldn't it be the default?

Weeeellll .... we're actually following the recommendations of the RFCs
here.  To answer your question very shortly, yes, you can make it display
the text/plain part with the -part or -type options to mhshow (I'm guessing
it will be something like "-part 2", but look at the output of mhlist;
you could also do "-type text/plain").

Now, should we be displaying the text part?  Technically, they're BOTH
text parts; one is text/html, the other is text/plain.  They're listed
as a multipart/alternative; that means that they're supposed to be
(roughly) the same, but in "better" formats.  So mhshow uses the best
format that it knows how to display; in your case, it knows how to
display text/html content, so that's what it picks.  We've talked about
putting in a default preference order for multipart/alternative content,
but that hasn't happened yet.

>2. There are several spurious occurrences of the character, 'Â' (look
>ma, I'm storing in UTF-8) in the chrome display. For example, just after
>"for another place to live!", on line 3. I don't know if this is a
>problem with me, with nmh, or with chrome.

Ummmm .... I guess it's technically a nmh problem, but there's a bunch
of finger-pointing going on here.  I'll try to explain the issue the
best that I can.

The reason you're seeing the spurious Â's is because the web content
contains (in quoted-printable format) the characters: =C2=A0.  What
is that?  Well, the character set for the text/html part is UTF-8, so
that ends up being U+00A0, NO-BREAK SPACE.  Some MUAs or other word
processors put those at the end of sentences:

        http://en.wikipedia.org/wiki/Non-breaking_space

But, you ask: why is this not working, if we're in a happy UTF-8 world
and everything is encoded properly?  Well, here's where things aren't
as glued together as I'd like.  What's happening is nmh is decoding the
quoted-printable correctly and giving it to Chrome.  That all works.  But
that particular message does NOT have a complete HTML header that
specifies the character set; instead, the Content-Type for the HTML part
says:

Content-Type: text/html; charset=UTF-8

But Chrome doesn't see that because it's not part of the HTML content.
The web standards say if a document doesn't specify the character set
in an email, it's supposed to assume the character set "windows-1252"
(I believe that's right).  In windows-1252, 0xC2 ends up as Â, and
0xA0 ends up as a non-breaking space (which you don't see).

If you look at the rules we put in for using w3m or links in
mhn.defaults you'll see some gyrations when it comes to the charset
parameter.  What that does is take the "charset" MIME parameter and
tell the command-line web browsers to use it as the default character
set if the HTML doesn't include a character set.

So, who's fault is it?  The email follows all of the relevant standards;
nmh deals with it properly in the out-of-the-box configuration.  Chrome
is doing the right thing when it comes to HTML content that lacks a
specific charset in the HTML content.  The right solution is to have
your invocation of Chrome set a default character set based on the
MIME charset parameter; nmh has the ability to get that stuff out
(see mhn.defaults), I just don't know how to tell Chrome that.

--Ken



reply via email to

[Prev in Thread] Current Thread [Next in Thread]