Re: mhfixmsg character set conversion

nmh-workers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: mhfixmsg character set conversion

From:	Steven Winikoff
Subject:	Re: mhfixmsg character set conversion
Date:	Tue, 08 Feb 2022 02:46:45 -0500

>I'm unable to replicate your problem here with the original message,
>and using your mhfixmsg invocation, mhfixmsg-format-text/html, and
>locale.  The only piece I think I'm missing is your mime_helper.
>I would give that a try if you send it to me.

I've attached the script, but (without having looked at it in a while) I
suspect it depends too heavily on other parts of my personal setup to be
usable for anyone else.  It turns out not to be relevant, but perhaps it
might be interesting to someone anyway.


>With nmh-1.7 mhfixmsg:
>mhfixmsg: /home/levine/src/nmh/msg part 2, decode text/plain; 
>charset=iso-8859-1
>mhfixmsg: /home/levine/src/nmh/msg part 1, will not decode because it
>is binary (line length > 998)
>mhfixmsg: /home/levine/src/nmh/msg part 2, convert UTF-8 to UTF-8

...and therein lies the answer.

I owe you an apology about this, and I'm sincerely sorry for wasting your
time on this question.

The key is the message about the line length being too long.  Seeing that
reminded me that I'd modified the stock 1.7.1 mhfixmsg with this patch:

   --- uip/mhfixmsg.c.original     2018-03-06 14:05:56.000000000 -0500
   +++ uip/mhfixmsg.c      2019-08-17 19:51:25.723267048 -0400
   @@ -2144,13 +2144,13 @@
                int last_char_was_cr = 0;

                for (i = 0, cp = buffer; i < inbytes; ++i, ++cp) {
   -                if (*cp == '\0'  ||  ++line_len > 998  ||
   +                if (*cp == '\0'  ||  ++line_len > 99998  ||
                        (*cp != '\n'  &&  last_char_was_cr)) {
                        encoding = CE_BINARY;
                        if (*cp == '\0') {
                            *reason = "null character";
   -                    } else if (line_len > 998) {
   -                        *reason = "line length > 998";
   +                    } else if (line_len > 99998) {
   +                        *reason = "line length > 99998";
                        } else if (*cp != '\n'  &&  last_char_was_cr) {
                            *reason = "CR not followed by LF";
                        } else {

I remember asking about the 998-character limit on this list, in a thread
from January 2018.  You explained why the limit exists, and suggested
another way to achieve what I was trying to do, which I tried but without
success -- I wasn't able to get what I wanted without this change, but I no
longer remember the details.

Obviously I need to revisit this question, because I just compiled a copy
of mhfixmsg from 1.7.1 without this patch, and it now behaves as you'd
expect:  it complains about the line length, and then generates correct
output with these headers:

   Content-Type: multipart/alternative;
        boundary=0126698af956ff6e1d4da4d88ae8ef4ebfb0beb8c16cc29b787641a31378
   Content-Transfer-Encoding: 8bit
   
   --0126698af956ff6e1d4da4d88ae8ef4ebfb0beb8c16cc29b787641a31378
   Content-Transfer-Encoding: 8bit
   Content-Type: text/plain; charset="UTF-8"
   Mime-Version: 1.0
   
   --0126698af956ff6e1d4da4d88ae8ef4ebfb0beb8c16cc29b787641a31378
   Content-Transfer-Encoding: quoted-printable
   Content-Type: text/html; charset=iso-8859-1
   Mime-Version: 1.0

With my patch, I get these headers:

   Content-Type: multipart/alternative;
      boundary=0126698af956ff6e1d4da4d88ae8ef4ebfb0beb8c16cc29b787641a31378
   Content-Transfer-Encoding: 8bit

   --0126698af956ff6e1d4da4d88ae8ef4ebfb0beb8c16cc29b787641a31378
   Content-Transfer-Encoding: 8bit
   Content-Type: text/plain; charset="UTF-8"
   Mime-Version: 1.0

   --0126698af956ff6e1d4da4d88ae8ef4ebfb0beb8c16cc29b787641a31378
   Content-Transfer-Encoding: 8bit
   Content-Type: text/html; charset=iso-8859-1
   Mime-Version: 1.0

There's still something going on that I don't understand, however.  The
way I've evaluated the output from mhfixmsg was by viewing it in vim, and
there's no question that the unpatched output looks fine while the patched
output is as I've been describing since the beginning of this thread.

...but when I look at the files with command-line tools such as more or
head, *both* versions look correct.  When I open both files in xed, the
unpatched file is fine, but the patched file generates this message:

   There was a problem opening the file /tmp/nmh_testing/xxx.

   The file you opened has some invalid characters. If you continue editing
   this file you could corrupt this document.

   You can also choose another character encoding and try again.

...with a menu offering "Automatically Detected", "Current Locale (UTF-8)"
and "Western (ISO-8859-15)" as possible character encodings.

In summary, I now know what's happening and (mostly) what to do about it,
but I still don't know why.

     - Steven
-- 
___________________________________________________________________________
Steven Winikoff      |
Montreal, QC, Canada | "I'd love to go out with you, but I'm
smw@smwonline.ca     |  attending the opening of my garage door."
http://smwonline.ca  |
                     |                           - fortune(6)

mime_helper
Description: mime_helper

[Prev in Thread]

Current Thread

[Next in Thread]

Re: mhfixmsg character set conversion, (continued)

Prev by Date: Re: mhfixmsg character set conversion
Next by Date: Re: automatic decode mime in repl
Previous by thread: Re: mhfixmsg character set conversion
Next by thread: Re: mhfixmsg character set conversion
Index(es):
- Date
- Thread