bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[OT] Unicode


From: Ivan Shmakov
Subject: [OT] Unicode
Date: Wed, 30 May 2012 17:14:53 +0700
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux)

>>>>> Paul Eggert <address@hidden> writes:
>>>>> On 05/29/2012 06:11 AM, Reuben Thomas wrote:

 >> I find UTF-8 to be a great boon precisely for making plain text more
 >> legible.

        I'd say that it allows the machine to discern certain things
        better.  As for, e. g., distinguishing “ambivalent” quote ('; as
        used in programming languages, with the notable expception of
        M4, which pairs it with `) from the proper typographic single
        quotes (‘, ’), an arrow from an ASCII-based C (or GNU R)
        construct, etc.

 > UTF-8 is sometimes necessary and usually works, but even today it
 > fails often enough that I'd rather avoid it if it's merely a minor
 > style issue such as arrows.  For example, if from my Fedora desktop I
 > run plain "ssh" into a random Solaris 11 host and try to paste that
 > "→" into Emacs, Emacs says "Regexp I-search backward:",

        The problem is that the 7-th bit, undefined by ASCII, was
        historically used for multiple purposes, and among those is the
        indication of the use of the Meta key.

        Now, the arrow (U+2192) is encoded as follows per UTF-8:

$ enable -n printf ; LC_ALL=en_US.UTF-8 printf \\u2192 | od -t o1 
0000000 342 206 222
0000003
$ 

        Which Emacs interprets as: M-b C-f C-M-r, or, given the bindings
        (currently effective in my Emacs instance; I assume they're the
        defaults; still): backward-word forward-char
        isearch-backward-regexp.

 > and if I try to visit a file containing the "→" I see "?".  I'm sure
 > that I can work around this issue with the proper ssh flags and
 > environment settings and whatnot, but who has the time?

        I've never seen a non-7-bit-clean SSH, but you still may need to
        set a UTF-8 locale (such as, e. g., en_US.UTF-8 in GNU; I'm not
        sure about Solaris), and check your terminal emulator's
        settings.

        As for Emacs, I guess that (set-language-environment "UTF-8") is
        sufficient.

-- 
FSF associate member #7257




reply via email to

[Prev in Thread] Current Thread [Next in Thread]