nmh-workers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

mhfixmsg character set conversion


From: Steven Winikoff
Subject: mhfixmsg character set conversion
Date: Thu, 03 Feb 2022 22:42:21 -0500

I routinely use mhfixmsg to clean up incoming messages, using this command
in a shell script invoked through procmail:

   mhfixmsg -decodetext 8bit -decodetypes text -textcharset UTF-8 \
            -reformat -fixcte -fixboundary -noreplacetextplain \
            -fixtype application/octet-stream -noverbose -file - \
            -outfile $destination < $source

This usually does what I expect, but the other day I received a message
with these characteristics:

   - mhlist reports the following structure:

       msg part  type/subtype              size description
        72       multipart/alternative      45K
           1     text/html                  42K
           2     text/plain                1501

   - the top level of the incoming message has this header (before
     mhfixmsg):

        Content-Type: multipart/alternative; boundary=01266[...]

   - the alternative parts have these headers:

        Content-Transfer-Encoding: quoted-printable
        Content-Type: text/plain; charset=iso-8859-1

     and

        Content-Transfer-Encoding: quoted-printable
        Content-Type: text/html; charset=iso-8859-1

   - after mhfixmsg, the top-level header is unchanged, as expected; the
     alternative part headers are changed to

        Content-Transfer-Encoding: 8bit
        Content-Type: text/plain; charset="UTF-8"

     and

        Content-Transfer-Encoding: 8bit
        Content-Type: text/html; charset=iso-8859-1

...but after conversion from iso-8859-1 to UTF-8, the output file is
mangled.

For reference, here's a section of the quoted-printable encoding from the
original message:

   Veuillez ne pas r=E9pondre au pr=E9sent courriel. Il a =E9t=E9 g=E9n=E9r=E9=
    automatiquement, nous ne pourrons pas y donner suite.

This should decode to the following (represented in UTF-8):

   Veuillez ne pas répondre au présent courriel. Il a été généré
   automatiquement, nous ne pourrons pas y donner suite.

   (all in one line, but split here for readability).

...but mhfixmsg turns that into

   Veuillez ne pas répondre au présent courriel. Il a été généré
   automatiquement, nous ne pourrons pas y donner suite.

   (also all in one line, but split here for readability).

Not that I care very much about this particular boilerplate sentence :-/,
but the message contained a lot of other text that I do care about, all of
which was mangled in the same way.

My questions are then:

1) Is this a bug in mhfixmsg, or am I just using it incorrectly?

2) If the former, is there further information I can supply to help track
   this down, or further tests I can conduct on the message in question?

3) ...or if the latter, what am I doing wrong, and what should I be doing
   instead?

  Thanks,

     - Steven
-- 
___________________________________________________________________________
Steven Winikoff      |
Montreal, QC, Canada | Aleph-null bottles of beer on the wall,
smw@smwonline.ca     | Aleph-null bottles of beer...
http://smwonline.ca  |



reply via email to

[Prev in Thread] Current Thread [Next in Thread]