[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
missing charset for non-ASCII text/x-patch MIME parts in Thunderbird
From: |
Stephen J. Turnbull |
Subject: |
missing charset for non-ASCII text/x-patch MIME parts in Thunderbird |
Date: |
Thu, 14 May 2015 17:28:43 +0900 |
Ivan Shmakov writes:
> As I’ve pointed earlier [1], Thunderbird (on the /sending/ side)
> for some reason chooses /not/ to file the ‘charset’
File a bug on Thunderbird, then. Absence of a charset parameter means
charset=US-ASCII, and Thunderbird should not be emitting US-ASCII MIME
parts with non-ASCII characters present. Not even if the MTAs agree
to use SMTP8.
> In the absence of the explicitly-stated encoding, the
> receiving side may resort to guessing,
A conformant receiver SHOULD NOT guess, unless the user has given it
explicit permission to do that (of course, then anything is OK). From
RFC 2046:
4.1.2. Charset Parameter
A critical parameter that may be specified in the Content-Type
field for "text/plain" data is the character set. This is
specified with a "charset" parameter, as in:
Content-type: text/plain; charset=iso-8859-1
Unlike some other parameter values, the values of the charset
parameter are NOT case sensitive. The default character set, which
must be assumed in the absence of a charset parameter, is US-ASCII.
Note that technically speaking the MUST in this section only applies
to text/plain, and not to any other text content-type. However, given
that the section says
The specification for any future subtypes of "text" must specify
whether or not they will also utilize a "charset" parameter, and
may possibly restrict its values as well. For other subtypes of
"text" than "text/plain", the semantics of the "charset" parameter
should be defined to be identical to those specified here for
"text/plain", i.e., the body consists entirely of characters in the
given charset.
Pretty clearly the intent is that the behavior of text/plain is to be
the default for other text content-types, unless *explicitly* stated
in the content-type spec. See also section
4.1.4. Unrecognized Subtypes
Unrecognized subtypes of "text" should be treated as subtype
"plain" as long as the MIME implementation knows how to handle the
charset.
This only makes sense when charset is unspecified if charset is
assumed to be US-ASCII.
> I presume this issue (the one of /not/ including the ‘charset’)
> is specific to Thunderbird. As an example, please look at a
> fragment of the original patch thus MIMEd from Gnus.
File a bug on Gnus, too. :-)
Of course Emacs should do what its user asks, but the default should
be to assume US-ASCII if there is no charset parameter, and to bitch
(not guess) if non-ASCII octets are seen.