nmh-workers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Nmh-workers] mhfixmsg issue with ill-formed mail.


From: Valdis Kletnieks
Subject: [Nmh-workers] mhfixmsg issue with ill-formed mail.
Date: Fri, 18 Nov 2016 23:24:13 -0500

Just when you think that in a third of a century of doing e-mail
you've seen every possible way to screw things up, new ways get invented.

So I have this in my .procmailrc:

TMPFILE=`mktemp -p /home/valdis/tmp fixmsg.XXXXXXXXXX`

# Canonify to 8-bit UTF-8
:0 wf
*!^Content-type:.*multipart/signed
| tee $TMPFILE | mhfixmsg -noverbose -file - -outfile -

(The tee, and the check for content-type because I didn't understand why pgp
signatures were going bad sometimes.)

Found one in my inbox today from the ACLU that ended:

(...)
Content-Transfer-Encoding: 7bit
Content-Type: multipart/alternative;
        boundary="----=_NextPart_749_A80A_5A5AF88C.647A7A28"
MIME-Version: 1.0
Message-ID: <address@hidden>
X-ReportingKey: 
MJ4CBHM1EIHT38HVKK6B0_JJ3CFC-J7948DM54E1V::address@hidden::1_478931
Subject: Sessions
Date: Fri, 18 Nov 2016 18:01:22 -0500
To: address@hidden
Reply-To: address@hidden
From: "Anthony D. Romero, ACLU Action" <address@hidden>
X-Gm-Spam: 0
X-Gm-Phishy: 0
X-Gm-Spam: 0
X-Gm-Phishy: 0

------=_NextPart_749_A80A_5A5AF88C.647A7A28--

(one blank line after the separator).

Why? Because the *input* (from that tee, so as it came into procmail) had:

Subject: Sessions
Date: Fri, 18 Nov 2016 18:01:22 -0500
To: address@hidden
Reply-To: address@hidden
From: "Anthony D. Romero, ACLU Action" <address@hidden>
X-Gm-Spam: 0
X-Gm-Phishy: 0
X-Gm-Spam: 0
X-Gm-Phishy: 0


------=_NextPart_749_A80A_5A5AF88C.647A7A28
Content-Type: text/plain;
        charset="UTF-8"
Content-Disposition: inline
Content-Transfer-Encoding: 7bit

        Hi Valdis -

and here's the next line, fed into od -cx:

0000000   P   r   e   s   i   d   e   n   t   -   e   l   e   c   t    
           7250    7365    6469    6e65    2d74    6c65    6365    2074
0000020   D   o   n   a   l   d       T   r   u   m   p       j   u   s
           6f44    616e    646c    5420    7572    706d    6a20    7375
0000040   t       a   n   n   o   u   n   c   e   d       h   e   b  \0
           2074    6e61    6f6e    6e75    6563    2064    6568    0062
0000060 031   s       n   o   m   i   n   a   t   i   n   g       S   e
           7319    6e20    6d6f    6e69    7461    6e69    2067    6553


and mhfixmsg just went nuts when it hit that \0. The exact failure mode depends
on how far into the bodypart the \0 is - sometimes the message body goes 
bye-bye,
other times the a chunk of text disappears and the next line is adjoined to
the front half of the previous line.

And sure enough, all the messages that are getting mangled have:

----=Content_Boundary
Content-Type: text/plain; charset="utf-8"
Content-transfer-encoding: 7bit

or a perversion thereof, and then a \0 in the text.

I admit being totally unclear as to where the \0's are coming from,
or what mhfixmsg should do when it sees one, or why  any software or person
thinks that 7bit CTE is a sane way to send around utf-8 data.

This is probably going to be a *total* joy to debug.


Attachment: pgpthNg0uA4tI.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]