[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[OT] Unicode
From: |
Ivan Shmakov |
Subject: |
[OT] Unicode |
Date: |
Wed, 30 May 2012 17:14:53 +0700 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux) |
>>>>> Paul Eggert <address@hidden> writes:
>>>>> On 05/29/2012 06:11 AM, Reuben Thomas wrote:
>> I find UTF-8 to be a great boon precisely for making plain text more
>> legible.
I'd say that it allows the machine to discern certain things
better. As for, e. g., distinguishing “ambivalent” quote ('; as
used in programming languages, with the notable expception of
M4, which pairs it with `) from the proper typographic single
quotes (‘, ’), an arrow from an ASCII-based C (or GNU R)
construct, etc.
> UTF-8 is sometimes necessary and usually works, but even today it
> fails often enough that I'd rather avoid it if it's merely a minor
> style issue such as arrows. For example, if from my Fedora desktop I
> run plain "ssh" into a random Solaris 11 host and try to paste that
> "→" into Emacs, Emacs says "Regexp I-search backward:",
The problem is that the 7-th bit, undefined by ASCII, was
historically used for multiple purposes, and among those is the
indication of the use of the Meta key.
Now, the arrow (U+2192) is encoded as follows per UTF-8:
$ enable -n printf ; LC_ALL=en_US.UTF-8 printf \\u2192 | od -t o1
0000000 342 206 222
0000003
$
Which Emacs interprets as: M-b C-f C-M-r, or, given the bindings
(currently effective in my Emacs instance; I assume they're the
defaults; still): backward-word forward-char
isearch-backward-regexp.
> and if I try to visit a file containing the "→" I see "?". I'm sure
> that I can work around this issue with the proper ssh flags and
> environment settings and whatnot, but who has the time?
I've never seen a non-7-bit-clean SSH, but you still may need to
set a UTF-8 locale (such as, e. g., en_US.UTF-8 in GNU; I'm not
sure about Solaris), and check your terminal emulator's
settings.
As for Emacs, I guess that (set-language-environment "UTF-8") is
sufficient.
--
FSF associate member #7257