nmh-workers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: mhfixmsg character set conversion


From: Steven Winikoff
Subject: Re: mhfixmsg character set conversion
Date: Wed, 09 Feb 2022 21:08:37 -0500

>> >I would look at output from mime_helper and see if it's UTF-8.
>>
>> Please forgive me for having to ask this, but how is mime_helper even
>> involved?  Isn't that used only when I read the message?  It isn't in
>> the procmail chain that saves the original copy, and it's the original
>> copy that we've been looking at.
>
>I don't know how mime_helper might fit in.  The lynx invocation is still
>my pick for the root cause but you said you're not clear on how it is
>involved.

I understand how it's involved for reading a message; the part I don't
understand is how it's involved in the sequence of steps that occurs when
a new message is received.

Specifically, to the best of my knowlege:

   1) sendmail hands the message off to procmail

   2) this procmail recipe is activated:

         :0 HBfw
         * ^Content-Type:.*text/
         | /home/smw/bin/email_decoder

I'll append a copy of email_decoder, but the gist of it is:

   - explicitly unset LC_ALL and set LANG to en_CA.UTF-8

   - save the incoming standard input in $source (a file in /tmp)

   - run ~smw/bin/decode_headers using $source as stdin (this explicitly
     decodes headers which are RFC 2047-encoded, and passes the body
     through unchanged)

   - feed stdout from decode_headers into the same mhfixmsg command
     I've already quoted a few times; I'll quote it again here to
     keep everything in one place:

        mhfixmsg -decodetext 8bit -decodetypes text -textcharset UTF-8 \
                 -reformat -fixcte -fixboundary -noreplacetextplain    \   
                 -fixtype application/octet-stream -noverbose -file -  \
                 -outfile "${tf}.fixed"

     ...where ${tf}.fixed is another, newly created file in /tmp

   - use cmp to compare $source and ${tf}.fixed; if they differ, save
     $source as a new message in +reformatted

The file which started this discussion is the one from +reformatted, and
I still can't see how lynx would have been involved in its creation.


>I would do this if you haven't already:
>1. download nmh HEAD, build, and install somewhere

I got this far, but I've been unable to proceed since the build failed as
described previously.  (To be fair, I also haven't had time to try to get
farther as yet.)


>2. move your $(mhpath +)/mhn.defaults
>3. move your profile and create one with just a Path: entry
>4. run the "mhfixmsg -file original_copy -out -" from 1. and see if the
>   output looks good or bad
>
>If it's good, then start adding things back in one at a time in reverse
>order (starting with mhfixmsg switches) until it's bad.

This sounds like an excellent plan, and I intend to follow through with it
on Friday; unfortunately I'll be busy with other things until then.

...although I may need help getting past the build problem.

     - Steven


8<-----------------------------   cut here   ---------------------------->8
#!/bin/sh
#
#  email_decoder -- rewrite quoted-printable and base64 text in a message
#
#  Steven Winikoff
#  2008/09/11
#  2010/01/22 -- use mhshow to decode
#  2014/05/19 -- always exit with status 0 (see note below)
#  2018/01/22 -- rewrite using mhfixmsg to do the heavy lifting
#  2019/10/17 -- ...and use ~smw/bin/decode_headers to decode RFC 2047
#                headers (for use with procmail, grep and mairix)
#
#  Given an email message on standard input with at least one portion
#  containing text encoded in base64 or quoted-printable format, the
#  object of the game is to send the same message back to stdout with
#  the text part(s) decoded.
#
#  A copy of the original message will also be saved in +reformatted
#  (AKA ~smw/Mail/reformatted/) unless the -t (test mode) option is
#  specified.
#
#  This is intended to be invoked in a procmail filter recipe.
#
#  Note that this is the reason why we always exit with status 0, even
#  when something goes wrong; this prevents procmail from cluttering its
#  log with messages similar to these:
#
#       procmail: Program failure (3) of "/home/smw/bin/email_decoder"
#       procmail: Rescue of unfiltered data succeeded
#
#  usage:  email_decoder [-t]
#
#--------------------------------------------------------------------------
#  setup:

PATH="/local/paths:/bin:/usr/bin:$PATH"
export PATH

unset LC_ALL; LANG="en_CA.UTF-8"; export LC_ALL LANG

tf="/tmp/decoder.`date +%Y%m%d.%H%M%S.$$`"
trap 'rm -rf ${tf}* >/dev/null 2>&1' 1 2 3 15

save_folder="+reformatted"

test_mode=0


#--------------------------------------------------------------------------
#  are we operating in test mode?

if [ ! -z "${1}" ]
then
   # officially test mode is indicated by the -t option, but in
   # practice we'll accept any argument at all to mean test mod;

   test_mode=1
fi


#--------------------------------------------------------------------------
#  save a copy of the original message:
#
#     if any changes are made (and if not operating in test mode), a copy
#     of the original will be left in +reformatted -- but we won't know
#     whether that's necessary until later

source="${tf}.original"
cat > ${source}


#--------------------------------------------------------------------------
#  run the message through decode_headers and mxfixmsg in that order:
#
#     notes:
#
#        - this relies on mhfixmsg having been patched to allow output
#          lines wider than 998 characters!) to decode base64 and
#          quoted-printable text parts:
#
#        - the -fixtype option to mxfixmsg (introduced in nmh-1.7) allows
#          uninformative MIME types to be replaced by something more
#          useful; it can be repeated as many times as necessary, with
#          a different type specified each time
#
#        - mxfixmsg changes the structure of some messages; for example:
#
#             before:
#
#                msg part  type/subtype              size description
#                279       multipart/mixed           540K
#                    1     multipart/related          32K
#                    1.1   text/html                  20K
#                    1.2   image/jpeg                2540 image001.jpg
#                    2     application/pdf           187K 162160.PDF
#                    3     application/pdf           187K 161858.PDF
#
#             after:
#
#                msg part  type/subtype              size description
#                280       multipart/mixed           542K
#                    1     multipart/related          34K
#                    1.1   multipart/alternative      30K
#                    1.1.1 text/html                  21K
#                    1.1.2 text/plain                8829
#                    1.2   image/jpeg                2540 image001.jpg
#                    2     application/pdf           187K 162160.PDF
#                    3     application/pdf           187K 161858.PDF

decode_headers < ${source} |                                      \
   mhfixmsg -decodetext 8bit -decodetypes text -textcharset UTF-8 \
            -reformat -fixcte -fixboundary -noreplacetextplain    \
            -fixtype application/octet-stream -noverbose -file -  \
            -outfile "${tf}.fixed"


#--------------------------------------------------------------------------
#  if we didn't actually change anything, just blat the original message to
#  stdout; otherwise save the original file (if not in test mode) and send
#  the modified version to stdout:

if cmp -s ${source} "${tf}.fixed"
then
   cat ${source}
else
   original="`mhpath ${save_folder} new`"
   [ ${test_mode} -lt 1 ] && cat ${source} > "${original}"
   formail -fA "X-Reformatted-From: ${original}" < ${tf}.fixed
fi


#--------------------------------------------------------------------------
#  done!  clean up and exit:

rm -rf ${tf}* >/dev/null 2>&1
exit 0
8<-----------------------------   cut here   ---------------------------->8
-- 
___________________________________________________________________________
Steven Winikoff      | Sometimes you will never know the value
Montreal, QC, Canada | of a moment until it becomes a memory.
smw@smwonline.ca     |
http://smwonline.ca  |                             - Dr. Seuss



reply via email to

[Prev in Thread] Current Thread [Next in Thread]