nmh-workers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: mhfixmsg character set conversion


From: David Levine
Subject: Re: mhfixmsg character set conversion
Date: Mon, 14 Feb 2022 02:58:13 +0000

Steven Winikoff writes:

> Unfortunately, running it through mhfixmsg results in the message coming
> back unchanged.  Is that specifically about -decodeheaderfieldbodies, or
> is mhfixmsg doing nothing because the message body is already unencoded
> text/plain?

That's because -decodeheaderfieldbodys utf8 only decodes UTF-8 text.

There was a reason for only allowing decoding of UTF-8 header field
bodies.  If any character set could be decoded, it would be possible
to produce header field bodies with embedded nulls, which I expect
would result in incorrect message parsing.  It certainly would with
scan(1):  it would truncate a Subject with an embedded null.

That can't happen with UTF-8 encoded text, assuming it doesn't contain
any single-byte NUL octets.  In addition to decoding UTF-8, we could
decode ASCII because 1) we've seen it in the wild, 2) it seems as
harmless as it is pointless to encode ASCII as ASCII, assuming no
NULs, and 3) it's a proper subset of UTF-8 so it doesn't interfere
with the semantics of the "-decodeheaderfieldbodies utf8" switch.

Any other suggestions?  If there's an enumeration of character
encodings that can't have NULs, we could expand those.

> But today I sent myself a message using an IMAP-based app on my phone,
> resulting in the appended, and I'd definitely want to decode the Subject:
> header.

So I'm curious, why is the ASCII encoded as ASCII?  Why not just fold
the header as usual?  This line is too long, I'm not sure if that is
related or if it's a separate issue:

Subject: =?US-ASCII?Q?Using_the_Linux_fold_command_to_mak?=
=?US-ASCII?Q?e_text_more_readable_=7C_Network_World?=

David



reply via email to

[Prev in Thread] Current Thread [Next in Thread]