[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [nmh-workers] nmh 1.7.1: both bcc and dcc broken for mts sendmail/pi
From: |
Paul Fox |
Subject: |
Re: [nmh-workers] nmh 1.7.1: both bcc and dcc broken for mts sendmail/pipe |
Date: |
Fri, 15 Feb 2019 09:19:50 -0500 |
ken wrote:
> >The �" � around `Blind-Carbon-Copy' should be \(lq and \(rq, or the
> >equivalent strings for consistency with the style used at start of the
> >paragraph.
>
> So, in a mostly unrelated note ... I couldn't help noticing that Ralph
> used guillemets ( � �) in one of his messages on this thread (way to push
> non-US-ASCII characters, Ralph!), and after a series of replies to his note
> things devolved into classic mojibake. And since hopefully most everyone
> on this thread is an nmh user, I wanted to understand why, because really
> that shouldn't have happened.
Mea Culpa. I haven't fully worked through the bug or the fix, but
rest assured, the problem isn't with nmh.
My replies and forwarded message drafts are constructed by a script
that predates replyfilter. It does things like add attribution ("ken
wrote:"), my .sig, and the bulk of the body with the " > " indents.
It includes the original headers if forwarding, but not when replying,
and also adjusts the current headers based on what folder I'm in, for
things like Reply-to: and Fcc:.
I haven't done full debugging yet, but looking quickly I see that the
body content is created by:
mhshow -form mhl.null -type text/plain -file $original_text |
utf_clean |
remove_part_markers_and_quote
where $original text is the path to the message being replied to.
The function remove_part_markers_and_quote() runs sed to get rid of
the "part markers" that mhshow emits:
remove_part_markers_and_quote()
{
# delete part markers entirely if they're the whole line,
# otherwise just remove that part of the line.
# and because we're already running sed, add the leading ' > '
sed -e '/address@hidden(\[ part .* \]\)@\*\]$/d' \
-e 's/address@hidden(\[ part .* \]\)@\*\]//' \
-e 's/^/ > /'
}
But utf_clean() is the culprit, I believe -- it's there to remove a
few really annoying binary characters that my fonts don't display
correctly. But it does so with a fairly large and indiscriminate
hammer, completely ignoring the current encoding.
utf_clean()
{
#eliminate utf hard non-printing space: <U+200B> or \u200B
#also eliminate A0, which is non-breaking space in iso-8859
sed -e 's/\xe2\x80\x8e/ /g' \
-e 's/\xe2\x80\x8b//g' \
-e 's/\xa0/ /g' \
-e 's/\xc2/ /g'
}
I'll work on this, and also take a look at replyfilter to see if
I can't get it to do more of the heavy lifting.
paul
>
> I went back to the raw archives (ftp://lists.gnu.org/nmh-workers/2019-02)
> because the mailing list software will sometimes translate stuff into
> base64 encoding when it sees non-ASCII characters. And, well, I hate to
> assign blame, but I think it's a bit unavoidable ... please, don't anyone
> take this as a personal attack, I am just trying to understand how we
> could do better.
>
> Ralph's original note containing the guillemets (Message-Id
> <address@hidden>) was text/plain, a
> character set of utf-8, and encoded using quoted-printable. The
> characters were encoded properly using quoted-printable, specifically
> they were listed as =C2=AB and =C2=BB.
>
> Valdis was the first reply to that (Message-ID
> <address@hidden>), and HIS email was text/plain,
> character set iso-8859-1, and encoded using quoted-printable. He quoted
> Ralph's message, and the guillemets were encoded as =AB and =BB. Which seems
> correct to me.
>
> Paul Fox replied to Valdis's note (Message-Id
> <address@hidden>), and THAT note
> was text/plain, character set UTF-8, encoded using quoted-printable ...
> but it seems like this was the start of where things went off the rails.
> The original line in Valdis's email was (in raw form):
>
> > The =AB=22=BB around ...
>
> But in Paul's note it ended up as (extra > added in the reply)
>
> > > The =AB" =BB around
>
> This is NOT correct. First, there is an extra space in front of
> the encoded bytes. Secondly, they're not valid UTF-8; they're the
> ISO-8859-1 bytes. So I am guessing whatever Paul used to quote the reply
> didn't translate the ISO-8859-1 characters properly into UTF-8.
>
> However, whatever Mark Bergman uses for email actually made an intelligent
> decision. When he replied to Paul's note, those invalid UTF-8 characters
> got converted to the Unicode Replacement Character (U+FFFD), which was
> sent out as =EF=BF=BD (utf-8, quoted-printable).
>
> Further muddying the waters ... when Ralph replied to Mark's email,
> those Unicode Replacement Characters somehow got converted back to
> the correct guillemets (=C2=AB and =C2=BB). Which means Ralph has
> perhaps the most intelligent reply quoting program ever and he should
> immediately share it as it would revolutionize AI, or he went back and
> manually corrected it when he replied to Mark's note. I'm 50/50 on
> which one of those scenarios is more likely.
>
> If anyone involved with this email thread wants to pipe up with some
> more explanation on what exactly they used to compose their email
> replies, I would love to hear it. No judgements; I just want to know
> how nmh could help everyone do better. Like, do we need to include
> better tools for composing reply messages? Well, duh, the answer to
> that is "yes", and I think replyfilter does ok here but obviously we
> need to do better. But if we're SENDING something that is not valid
> UTF-8, should we be smarter and flag it? People were upset when we
> refused to send out 8-bit characters when your locale was US-ASCII (I
> mean, REALLY? I couldn't believe it), so I don't know what makes sense.
> Sending out invalid UTF-8 just seems wrong to me.
>
> --Ken
>
> --
> nmh-workers
> https://lists.nongnu.org/mailman/listinfo/nmh-workers
=----------------------
paul fox, address@hidden (arlington, ma, where it's 33.6 degrees)
- Re: [nmh-workers] nmh 1.7.1: both bcc and dcc broken for mts sendmail/pipe, (continued)
- Re: [nmh-workers] nmh 1.7.1: both bcc and dcc broken for mts sendmail/pipe, Ralph Corderoy, 2019/02/15
- Re: [nmh-workers] nmh 1.7.1: both bcc and dcc broken for mts sendmail/pipe, Ken Hornstein, 2019/02/14
- Re: [nmh-workers] nmh 1.7.1: both bcc and dcc broken for mts sendmail/pipe, valdis . kletnieks, 2019/02/14
- Re: [nmh-workers] nmh 1.7.1: both bcc and dcc broken for mts sendmail/pipe, Ken Hornstein, 2019/02/14
- Re: [nmh-workers] nmh 1.7.1: both bcc and dcc broken for mts sendmail/pipe, valdis . kletnieks, 2019/02/15
- Re: [nmh-workers] nmh 1.7.1: both bcc and dcc broken for mts sendmail/pipe, valdis . kletnieks, 2019/02/15
- Re: [nmh-workers] nmh 1.7.1: both bcc and dcc broken for mts sendmail/pipe, Paul Fox, 2019/02/15
- Re: [nmh-workers] nmh 1.7.1: both bcc and dcc broken for mts sendmail/pipe, valdis . kletnieks, 2019/02/15
- Re: [nmh-workers] nmh 1.7.1: both bcc and dcc broken for mts sendmail/pipe, Robert Elz, 2019/02/15
- Re: [nmh-workers] nmh 1.7.1: both bcc and dcc broken for mts sendmail/pipe, valdis . kletnieks, 2019/02/15
- Re: [nmh-workers] nmh 1.7.1: both bcc and dcc broken for mts sendmail/pipe,
Paul Fox <=
- Re: [nmh-workers] nmh 1.7.1: both bcc and dcc broken for mts sendmail/pipe, Ralph Corderoy, 2019/02/15
- Re: [nmh-workers] nmh 1.7.1: both bcc and dcc broken for mts sendmail/pipe, Robert Elz, 2019/02/15
- Re: [nmh-workers] nmh 1.7.1: both bcc and dcc broken for mts sendmail/pipe, Ralph Corderoy, 2019/02/15
- Re: [nmh-workers] nmh 1.7.1: both bcc and dcc broken for mts sendmail/pipe, Robert Elz, 2019/02/15
- Re: [nmh-workers] nmh 1.7.1: both bcc and dcc broken for mts sendmail/pipe, Ralph Corderoy, 2019/02/15