bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunder


From: Andy Moreton
Subject: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
Date: Wed, 01 May 2019 01:35:09 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (windows-nt)

On Tue 30 Apr 2019, Paul Eggert wrote:

> The attachment has a text/* media type but it has no charset parameter.
> The patch itself (output by git format-patch) says its charset is UTF-8.
> Unfortunately, Gnus doesn't recognize the patch as UTF-8 and so
> mishandles the non-ASCII characters in the attachment. To reproduce the
> problem, read this email with Gnus; the full attachment is attached to
> this email in the Thunderbird way.
>
> Although Internet RFC 2046 section 4.1.2 says the default charset for
> text/* media types is US-ASCII, Internet RFC 6557 section 3 amends this
> to say that registered text/* media types should require a charset
> specification (or should say it's not needed because the payload has
> that info, which obviously doesn't apply here). It later says that if
> there is a strong reason to have a charset default, the default should
> be UTF-8.
>
> Unfortunately Gnus apparently doesn't default to UTF-8 for such
> attachments, which means that sending a text/x-patch attachment from
> Thunderbird to Gnus messes up if the attachment contains non-ASCII
> characters. This has been causing problems on the Emacs mailing list for
> years and it bit a correspondent of mine again today; see
> <https://debbugs.gnu.org/cgi/bugreport.cgi?bug=35502#35>.
>
> I have filed a Thunderbird bug report for this, as Thunderbird should
> specify a charset; see
> <https://bugzilla.mozilla.org/show_bug.cgi?id=1167982>. However, Gnus
> should be a polite citizen and handle these attachments nicely rather
> than converting the non-ASCII UTF-8 characters to mojibake.

After a bit of experimenting, this minimal patch appears to fix things.
Should this also allow the user to choose the charset if none is
specified, or just hardwire it to utf-8 ?

diff --git a/lisp/gnus/mm-decode.el b/lisp/gnus/mm-decode.el
index 3f255419e7..a99d52a7e7 100644
--- a/lisp/gnus/mm-decode.el
+++ b/lisp/gnus/mm-decode.el
@@ -665,6 +665,9 @@ mm-dissect-buffer
        (setq type (split-string (car ctl) "/"))
        (setq subtype (cadr type)
              type (car type))
+        ;; Fix missing charset in Thunderbird
+        (unless (assq 'charset (cdr ctl))
+          (push '(charset . utf-8) (cdr ctl)))
        (setq
         result
         (cond






reply via email to

[Prev in Thread] Current Thread [Next in Thread]