nmh-workers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: mhfixmsg character set conversion


From: David Levine
Subject: Re: mhfixmsg character set conversion
Date: Sat, 12 Feb 2022 15:16:52 +0000

Steven wrote:

>    1) Replacing par does indeed fix one of the three failed tests.

Progress!

> ...so clearly I need to replace elinks in my html_to_text script, and doing
> that will solve the problem that prompted this discussion, leaving the
> following questions:
>
>    1) What's the best replacement for elinks?

mhn.defaults.sh looks for text/html helpers in this order:
    1. w3m
    2. lynx
    3. elinks

I don't know if one is necessarily "better" than another.

If you have suggestions on how to improve the arguments that mhn.defaults.sh
uses for elinks, please let us know.

>    2) Should I replace my 1.7.1 installation by the version I just built?
>       Basically I'm asking what benefits the current snapshot has over
>       1.7.1,

See docs/pending-release-notes.

>       and how far away the next numbered release might be.

Unknown.  Ken appears to be busy.  One of us here could push it out.  It's
been almost 4 years so I think that would be a good idea.  Perhaps after
things here settle down a bit.

>    3) How can I guarantee that messages will be saved with quoted-printable
>       or base64 parts decoded, without patching mhfixmsg to deal with
>       messages in which the decoded text would be more than 998 characters
>       long?

I don't know your reason for patching mhfixmsg.  IIRC, you were using
-decodetext 8bit; binary instead of 8bit might help.  The mhfixmsg man
page might provide some insight.

>       That raises some further questions:
>
>          - Why wasn't the text/html part converted to utf-8?

mhfixmsg only converts the character set of text/plain.  That was a
design decision.  Other subtypes can be extracted with mhstore and run
through iconv.  If there's a use for converting them in place in
mhfixmsg, it wouldn't be difficult but I'm not sure how useful it
would be.

>          - Regardless of the answer to the previous question, after a
>            message has been refiled (and assuming I'm not planning to
>            resend it to anyone), is there a practical difference between
>            binary and 8bit encoding?

"Note that -decodetext binary can produce messages that are not compliant
with RFC 5322, ยง2.1.1."

>          - Why are the headers of the decoded message identical to those
>            of the input, despite the use of -decodeheaderfieldbodies?
>
>            (...and yes, the unmodified version of the message does contain
>             some encoded headers that my decode_headers program found and
>             decoded; mhfixmsg appears not to have done so).

Is it a proper MIME message (does mhfixmsg return with a non-zero exit
status)?  If so, can you send it to me off-line?

The test suite has a case, boiled down a bit here:

$ cat test1
To: recipient@example.com
From: sender@example.com
Date: Wed, 28 Sep 2016 11:24:28 -0400
Subject: ?utf-8?B?dGhpcyBTdWJqZWN0IHdhcyBVVEYtOCBlbmNvZGVk?MIME-Version: 1.0
Content-Type: multipart/mixed; boundary 1a114dd3e8fe9c56053d92f414
Content-Transfer-Encoding: 8bit

--001a114dd3e8fe9c56053d92f414
Content-Type: text/plain; charsetUTF-8

This is a test.

--001a114dd3e8fe9c56053d92f414--
$ mhfixmsg -file test1 -out - -decodeheader utf-8 | diff - test1
4c4
< Subject: this Subject was UTF-8 encoded
---
> Subject: ?utf-8?B?dGhpcyBTdWJqZWN0IHdhcyBVVEYtOCBlbmNvZGVk?
David



reply via email to

[Prev in Thread] Current Thread [Next in Thread]