Re: [Pan-users] Annoying ' in posts

From: Steven D'Aprano
Subject: Re: [Pan-users] Annoying ' in posts
Date: Fri, 21 Sep 2012 23:59:09 +1000
On 21/09/12 20:58, DLSauers wrote:
On Thu, 20 Sep 2012 14:36:21 +1000, Steven D'Aprano wrote:

Short answer: it's an encoding problem. Some doofus is probably pasting
so-called "Smart Quotes" from Microsoft Word into their post, and their
news reader program (or Google Groups *spit*) is not adjusting the
encoding as it should.

I can probably guess which piece of steaming cruft generates this

The article is aimed at programmers, but doesn't assume any programming
knowledge, and everyone should read it.

unicode (*spit*, *spit*, *spit*, *SPIT*!)

ASCII please.

The rest of the world AND the *nineteenth* century wants to say a few words
to you. ASCII was crap from the moment it was invented -- there has never
been a time, not even one single minute, that ASCII has been sufficient for
even the full *American English* character set, let alone British English,
international or historical character sets. It was already crippled in 1963
when it was first published, and is crippled beyond redemption in 2012.

(Although in fairness, given the technical limitations back in 1963, the
designers of ASCII did a reasonable job of making something that was usable
for a subset of American English.)

Start with trade and currency: 99¢ £ ¥

Electronics: Ω

Fractions, proportions, maths: ½ ¼ ‰ ± ÷ π ≠

Temperatures: 45°F

Intellectual property: © ®

Heavy metal: Blue Öyster Cult, Motörhead, Mötley Crüe, Наӥв

Punctuation: “ ”

Encyclopædia Britannica, résumé

to say nothing of the fact that most of the world, about 6 billion or so
people, find ASCII completely insufficient for their communication needs.
And that includes many people whose only language is English.

Pan does a fine job at dealing with Unicode text. Don't let the existence
of broken software that *doesn't* deal with text correctly prejudice you
against Unicode. The problems you are seeing is not because of Unicode,
but because of programmers who DON'T use Unicode correctly (or possible
at all). Their ignorance and incompetence is the problem, not Unicode.

It's nothing to do with the font, and there's no such thing as "plain

Sure there is, ASCII 32-127 specifically is all that should be on NNTP
posts and email.

Firstly, ASCII is not "plain text", it is an encoding from bytes (numbers)
to text, no different to other encodings such as EBCDIC, MacRoman, Latin-1,
and dozens of others, except it is even less useful.

Secondly, the restriction to ASCII characters 32-127 only applies to the
transport mechanism, not the content. You can send binary files by email
via a completely transparent wrapper mechanism (e.g. uuencoding). Likewise
you can sent Unicode text. The technical limitations of SMTP and NNTP are


