nmh-workers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: mhfixmsg character set conversion


From: Steven Winikoff
Subject: Re: mhfixmsg character set conversion
Date: Fri, 04 Feb 2022 15:58:12 -0500

>I expect that your environment is close enough to:
>
>[details snipped]

Pretty much.  Here's what I have:

$ iconv --version
iconv (GNU libc) 2.33

$ locale
LANG=en_CA.UTF-8
LC_CTYPE="en_CA.UTF-8"
LC_NUMERIC=en_CA.UTF-8
LC_TIME=en_CA.UTF-8
LC_COLLATE=C
LC_MONETARY=en_CA.UTF-8
LC_MESSAGES="en_CA.UTF-8"
LC_PAPER=en_CA.UTF-8
LC_NAME=en_CA.UTF-8
LC_ADDRESS=en_CA.UTF-8
LC_TELEPHONE=en_CA.UTF-8
LC_MEASUREMENT=en_CA.UTF-8
LC_IDENTIFICATION=en_CA.UTF-8
LC_ALL=

...so the only differences are LC_COLLATE=C, which I set because I prefer
the way it sorts, and LC_ALL, which must be being set by a side effect of
something, because I'm not doing so explicitly.


>With this small example:
>
>[snip]


>I see correct conversion of the quoted-printable E9 to UTF-8 C3A9:

So do I, which suggests that there's something in the content of the
specific message I'm working with.


>Does adding -verbose to your mhfixmsg invocation provide any clues?
>mhfixmsg: /tmp/mhfixmsgUgtVK1 part 2, decode text/plain; charset=iso-8859-1
>mhfixmsg: /tmp/mhfixmsgUgtVK1 part 1, decode text/html; charset=iso-8859-1
>mhfixmsg: /tmp/mhfixmsgUgtVK1 part 2, convert iso-8859-1 to UTF-8

This is the output I receive:

$ mhfixmsg -decodetext 8bit -decodetypes text -textcharset UTF-8 -reformat \
           -fixcte -fixboundary -noreplacetextplain \
           -fixtype application/octet-stream -verbose -file - \
           -outfile $destination < $source
mhfixmsg: /home/smw/Mail/mhfixmsgnss3pI part 2, decode text/plain; 
charset=iso-8859-1
mhfixmsg: /home/smw/Mail/mhfixmsgnss3pI part 1, decode text/html; 
charset=iso-8859-1
mhfixmsg: /home/smw/Mail/mhfixmsgnss3pI part 2, convert UTF-8 to UTF-8

...which is interesting for more than one reason, including that there's
apparently no conversion of iso-8859-1 to UTF-8, and that in fact it's
part 1 rather than part 2 that gets converted improperly; part 2 still
has

   Content-Type: text/html; charset=iso-8859-1

     - Steven
-- 
___________________________________________________________________________
Steven Winikoff      | "Algebra? [...] But that's far too
Montreal, QC, Canada |  difficult for seven-year-olds!"
smw@smwonline.ca     | "Yes, but I didn't tell them that
http://smwonline.ca  |  and so far they haven't found out"
                     |
                     |      - Terry Pratchett (Thief of Time)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]