[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: mhfixmsg character set conversion
From: |
Steven Winikoff |
Subject: |
Re: mhfixmsg character set conversion |
Date: |
Fri, 04 Feb 2022 20:33:09 -0500 |
>As Robert and Ken pointed out, one explanation could be that the
>content is converted twice, the second time incorrectly.
I saw those replies, but I wasn't sure how to interpret them (as in, the
evidence is compelling, but I have no idea why that would be happening or
what to do about it).
>I don't see at this point how mhfixmsg could do that but this needs more
>investigation. We can continue this way, or if you want to send me a
>sanitized excerpt of the message, I'd be glad to work with it.
I can't think of a reasonable way to sanitize it, but I'm willing to send
it to you privately. Should I use your <levinedl@acm.org> address for this
purpose?
>> $ mhfixmsg -decodetext 8bit -decodetypes text -textcharset UTF-8 -reformat \
>> -fixcte -fixboundary -noreplacetextplain \
>> -fixtype application/octet-stream -verbose -file - \
>> -outfile $destination < $source
>> mhfixmsg: /home/smw/Mail/mhfixmsgnss3pI part 2, decode text/plain;
>> charset=iso-8859-1
>> mhfixmsg: /home/smw/Mail/mhfixmsgnss3pI part 1, decode text/html;
>> charset=iso-8859-1
>> mhfixmsg: /home/smw/Mail/mhfixmsgnss3pI part 2, convert UTF-8 to UTF-8
>>
>> ...which is interesting for more than one reason, including that there's
>> apparently no conversion of iso-8859-1 to UTF-8,
>
>That's strange, unless $source had already been run through mhfixmsg.
It hadn't. In normal use my procmail-invoked shell script does run the
message through a program I wrote myself, which decodes 2047-encoded
headers -- but that only affects the headers, and passes the body through
unmodified; the relevant excerpt for that is:
[ loop that processes header lines elided]
172 /** an empty input line means the end of the message headers: **/
173
174 if (strlen(input_line) < 1) break;
175 }
176
177
178 /** read and write message body: **/
179
180 while (getline(&input_line, &len, infile) >= 0)
181 {
182 fputs(input_line, outfile);
183 }
184
185
186 /** ...and we're done: **/
187
188 return(0);
189
190 }
The only change this produces in the problematic message is as follows:
47,57c47,57
< X-SG-EID:
=?us-ascii?Q?CePduXinO1TKWf=2FmbcRcIcb5o7KEfW6Q=2FLxIZrPrRA0dtxQ5evb2UIV0M0r6v6?=
< =?us-ascii?Q?DfqG=2FoldGlAr6l6p1riD1OEyVdX0=2F57dKo740dz?=
< =?us-ascii?Q?NZIhwlTw5J3KSyIU4H7pjfyfMBv0e9LGxKHVezS?=
< =?us-ascii?Q?FeSLaVJyOzyyK3LeB3eGx+QysKjtjkJzuVDXsW4?=
< =?us-ascii?Q?ZiePczPvW34XaHeheXAl2m0RGMRgZENpvRzzX2M?=
< =?us-ascii?Q?G6=2FuEHfZ5+X57rF1w=3D?=
< X-SG-ID:
=?us-ascii?Q?N2C25iY2uzGMFz6rgvQsb8raWjw0ZPf1VmjsCkspi=2FKHgAsE=2FCUk5eZaRe5Ltr?=
< =?us-ascii?Q?cbw5EBe1xYnaBlEvYrWq76guWX6eVcLnBjZLZsv?=
< =?us-ascii?Q?fUgud7M9swcG4+O7RGb81dd6HibI6WdUCRYi2bx?=
< =?us-ascii?Q?T8y2GlCc1B+71TSgKjD9dEU2IqN30RZ1qRbAGlx?=
< =?us-ascii?Q?5EAyl462xuJc+?=
---
> X-SG-EID: CePduXinO1TKWf/mbcRcIcb5o7KEfW6Q/LxIZrPrRA0dtxQ5evb2UIV0M0r6v6
> DfqG/oldGlAr6l6p1riD1OEyVdX0/57dKo740dz
> NZIhwlTw5J3KSyIU4H7pjfyfMBv0e9LGxKHVezS
> FeSLaVJyOzyyK3LeB3eGx+QysKjtjkJzuVDXsW4
> ZiePczPvW34XaHeheXAl2m0RGMRgZENpvRzzX2M
> G6/uEHfZ5+X57rF1w=
> X-SG-ID: N2C25iY2uzGMFz6rgvQsb8raWjw0ZPf1VmjsCkspi/KHgAsE/CUk5eZaRe5Ltr
> cbw5EBe1xYnaBlEvYrWq76guWX6eVcLnBjZLZsv
> fUgud7M9swcG4+O7RGb81dd6HibI6WdUCRYi2bx
> T8y2GlCc1B+71TSgKjD9dEU2IqN30RZ1qRbAGlx
> 5EAyl462xuJc+
...but in my testing last night and just now, I see the same behavior
when I run mhfixmsg directly on the unmodified original file (my script
always saves an unmodified copy when it makes changes, in case something
goes wrong).
>Conversion to the same charset is a no-op, I'll look into removing the
>verbose output in that case.
That's probably a helpful thing to do, but the question I was wondering
about wasn't why the UTF-to-UTF conversion was reported, but rather why
the iso-8859-1-to-UTF conversion wasn't reported.
>> and that in fact it's part 1 rather than part 2 that gets converted
>> improperly
>
>The part numbers are reversed because that's the order used for display.
>Part 2 is the text/plain part, that's the one that got converted.
Thank you. That clears up part of my confusion.
- Steven
--
___________________________________________________________________________
Steven Winikoff | "The thing is, I mean, there's times when
Montreal, QC, Canada | you look at the universe and you think,
smw@smwonline.ca | 'What about me?' and you can just hear
http://smwonline.ca | the universe replying, 'Well, what about
| you?'"
| - Terry Pratchett (Thief of Time)
- Re: In Memoriam: Norman Z. Shapiro 1932-2021, Ken Hornstein, 2022/02/01
- Re: In Memoriam: Norman Z. Shapiro 1932-2021, Jon Steinhart, 2022/02/01
- mhfixmsg character set conversion, Steven Winikoff, 2022/02/03
- Re: mhfixmsg character set conversion, David Levine, 2022/02/04
- Re: mhfixmsg character set conversion, Ken Hornstein, 2022/02/04
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/04
- Re: mhfixmsg character set conversion, David Levine, 2022/02/04
- Re: mhfixmsg character set conversion,
Steven Winikoff <=
- Re: mhfixmsg character set conversion, Ken Hornstein, 2022/02/04
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/04
- Re: mhfixmsg character set conversion, David Levine, 2022/02/05
- Re: mhfixmsg character set conversion, David Levine, 2022/02/06
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/06
- Re: mhfixmsg character set conversion, David Levine, 2022/02/06
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/07
- Re: mhfixmsg character set conversion, David Levine, 2022/02/07
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/08
- Re: mhfixmsg character set conversion, Ralph Corderoy, 2022/02/08