[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: mhfixmsg character set conversion
From: |
Steven Winikoff |
Subject: |
Re: mhfixmsg character set conversion |
Date: |
Fri, 04 Feb 2022 15:58:12 -0500 |
>I expect that your environment is close enough to:
>
>[details snipped]
Pretty much. Here's what I have:
$ iconv --version
iconv (GNU libc) 2.33
$ locale
LANG=en_CA.UTF-8
LC_CTYPE="en_CA.UTF-8"
LC_NUMERIC=en_CA.UTF-8
LC_TIME=en_CA.UTF-8
LC_COLLATE=C
LC_MONETARY=en_CA.UTF-8
LC_MESSAGES="en_CA.UTF-8"
LC_PAPER=en_CA.UTF-8
LC_NAME=en_CA.UTF-8
LC_ADDRESS=en_CA.UTF-8
LC_TELEPHONE=en_CA.UTF-8
LC_MEASUREMENT=en_CA.UTF-8
LC_IDENTIFICATION=en_CA.UTF-8
LC_ALL=
...so the only differences are LC_COLLATE=C, which I set because I prefer
the way it sorts, and LC_ALL, which must be being set by a side effect of
something, because I'm not doing so explicitly.
>With this small example:
>
>[snip]
>I see correct conversion of the quoted-printable E9 to UTF-8 C3A9:
So do I, which suggests that there's something in the content of the
specific message I'm working with.
>Does adding -verbose to your mhfixmsg invocation provide any clues?
>mhfixmsg: /tmp/mhfixmsgUgtVK1 part 2, decode text/plain; charset=iso-8859-1
>mhfixmsg: /tmp/mhfixmsgUgtVK1 part 1, decode text/html; charset=iso-8859-1
>mhfixmsg: /tmp/mhfixmsgUgtVK1 part 2, convert iso-8859-1 to UTF-8
This is the output I receive:
$ mhfixmsg -decodetext 8bit -decodetypes text -textcharset UTF-8 -reformat \
-fixcte -fixboundary -noreplacetextplain \
-fixtype application/octet-stream -verbose -file - \
-outfile $destination < $source
mhfixmsg: /home/smw/Mail/mhfixmsgnss3pI part 2, decode text/plain;
charset=iso-8859-1
mhfixmsg: /home/smw/Mail/mhfixmsgnss3pI part 1, decode text/html;
charset=iso-8859-1
mhfixmsg: /home/smw/Mail/mhfixmsgnss3pI part 2, convert UTF-8 to UTF-8
...which is interesting for more than one reason, including that there's
apparently no conversion of iso-8859-1 to UTF-8, and that in fact it's
part 1 rather than part 2 that gets converted improperly; part 2 still
has
Content-Type: text/html; charset=iso-8859-1
- Steven
--
___________________________________________________________________________
Steven Winikoff | "Algebra? [...] But that's far too
Montreal, QC, Canada | difficult for seven-year-olds!"
smw@smwonline.ca | "Yes, but I didn't tell them that
http://smwonline.ca | and so far they haven't found out"
|
| - Terry Pratchett (Thief of Time)
- Re: In Memoriam: Norman Z. Shapiro 1932-2021, Ken Hornstein, 2022/02/01
- Re: In Memoriam: Norman Z. Shapiro 1932-2021, Jon Steinhart, 2022/02/01
- mhfixmsg character set conversion, Steven Winikoff, 2022/02/03
- Re: mhfixmsg character set conversion, David Levine, 2022/02/04
- Re: mhfixmsg character set conversion, Ken Hornstein, 2022/02/04
- Re: mhfixmsg character set conversion,
Steven Winikoff <=
- Re: mhfixmsg character set conversion, David Levine, 2022/02/04
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/04
- Re: mhfixmsg character set conversion, Ken Hornstein, 2022/02/04
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/04
- Re: mhfixmsg character set conversion, David Levine, 2022/02/05
- Re: mhfixmsg character set conversion, David Levine, 2022/02/06
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/06
- Re: mhfixmsg character set conversion, David Levine, 2022/02/06
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/07
- Re: mhfixmsg character set conversion, David Levine, 2022/02/07