Re: [Nmh-workers] mojibake in UTF-8 encoded quoted-printable messages

From: Ken Hornstein
Date: Thu, 24 Oct 2013 09:14:01 -0400

>The munged character in your fist example looks like it's
>supposed to be c3 bc c3, but instead is 83 c2 bc, if I did
>that right.  It takes more than one step to get from here to
>there, such as losing bits and wrong endian?

Actually, I think Joel was trying to say "für", which has the middle
letter as an lowercase "u" with umlaut.  That would be U+00FC, which has
a UTF-8 encoding of C3 BC.  The characters he sees are Ã, uppercase A
with tilde, U+00C3, and ¼, vulgar fraction one quarter, U+00BC.

C3 is à in ISO-8859-1, and BC is ¼ in ISO-8859-1; something is clearly
interpreting the UTF-8 bytes as ISO-8859-1.  But since your locale and
the message are both UTF-8, this doesn't feel like an nmh problem to
me.  If you just saw the unencoded quoted-printable, yeah, that would
probably be us.  But you're seeing the correct bytes; something in your
display path isn't doing the right thing.


