[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Getting Emacs to play nice with Hunspell and apostrophes

From: Emanuel Berg
Subject: Re: Getting Emacs to play nice with Hunspell and apostrophes
Date: Sat, 14 Jun 2014 03:35:05 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)

Yuri Khan <address@hidden> writes:

> The fact that everybody uses " and ' and ` is a
> historical artifact, a workaround of sorts, due to
> the limitations of the mechanical typewriter. We need
> not be affected by it any more.
> There was no possibility of including all the
> required typographical characters or accented letters
> into the printing ball, so both quotes (“ and ”) and
> the diaeresis got conflated into a straight quote ",
> both single quotes (‘ and ’) into a straight single
> quote/apostrophe ', and the backtick ` and tilde ~
> were there to facilitate typing accented letters.
> This limitation then crept into computers, because
> this way the character set could be encoded in 7
> bits. The computer keyboard was just modeled after
> the typewriter keyboard, with a few extensions.
> Then the inevitable struck: computers expanded from
> the US and UK into Germany, Sweden, Finland, France,
> Canada, and then countries with non-Latin scripts
> (Greek, Cyrillic, and CJK). And all of them wanted to
> have dedicated code points for their characters,
> e.g. type a single ä instead of [a,
> backspace-no-delete, "].
> For a good while, we lived in a nightmare of ten
> thousand code pages.  In Russia, you could receive an
> email and see a jumble of utterly meaningless words
> because the message could be re-encoded (or the
> Content-Type charset= stripped or re-labeled) on any
> of the intermediate servers; there existed programs
> which were able to heuristically detect the chain of
> re-encodings applied on the way and decode your
> message for you. You could order a book in an
> Internet shop, have them completely b0rk up the
> encoding of the shipping address:
> Then somebody at the postal system might decode the
> characters and the package would still be delivered
> at the intended address.
> Now that every widely used operating system supports
> Unicode, we don’t have an excuse for clinging to
> those workarounds of the past century.  We are not
> limited by the 7-bit ASCII encoding and can store
> texts in their true form. We also are not constrained
> by the typewriter keyboard, having input methods
> based on Compose or Level3 allowing us to
> conveniently enter all the necessary diverse
> characters. On X11/GNU/Linux in particular it comes
> bundled with the system; on Windows, one has to
> install a third-party package.
> Much of the software has already evolved to support
> Unicode. That which hasn’t, has to catch up. From a
> spell checker, in particular, I expect that it should
> (perhaps with an optional switch) be able to flag as
> error any spelling of “isn’t” where the character
> between n and t is not the preferred apostrophe
> character U+2019.

First, let me tell you I very much appreciated this

We agree that ', ", and the rest of the non-Unicode
chars that may (not) be used in more or less the same
context - we agree that those are there (not there) for
techno-historical reasons.

Where we *don't* agree is that you think that, if I'm
allowed to pseudo-quote you:

- Today, now that there aren't any technical
  limitations, we should go for the more advanced

Here is where I say:

Just because it is possible, doesn't mean it is desired
if there is no gain. It is possible to change all the
software in the world to be able to use those
chars. But why? For the reasons you stated, in the
Internet and Usenet and otherwise computer culture,
many, many people have come to use English, and the 7-
(or 8) bits chars have spread and became a de facto
standard. So people's eyes and brains and fingers are
trained to use those. We have all came together from
different starting points. The UK and US people had to
go the shortest way (as the pioneers, perhaps they
earned it). The Swedes had to learn English. The
Russians had to go somewhat further because Russian is
farther from English than Swedish. And so on. So when
we finally have something in common - why break it just
because it is possible?  With some computer languages
like Java it is possible for me to program in Swedish,
using the ä, å, and ö. But why would I want to do that?
It would bring havoc to my brain as the rest of the
language would still be English. But more importantly,
it would isolate my program from the rest of the
world. I couldn't communicate about it (ask questions,
tell people about it with the support of code snippets,
etc.) and it couldn't be configured/extended by a
non-Swedish speaking person. So I'll just stick to C,
in English. Just as I will stick to ' as that is the
correct way (as I see it) to write in "Computer

underground experts united:

reply via email to

[Prev in Thread] Current Thread [Next in Thread]