bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?


From: Drew Adams
Subject: bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
Date: Sun, 2 Oct 2011 10:38:43 -0700

(Not claiming this additional question relates to a bug, in particular to this
bug report - except in so far as it asks for better doc.)

In `ucs-names', what are the CHAR-NAMEs "VARIATION SELECTOR-n" all about (for
n=17...256)?  Are those actually character names?

Googling indicates that a variation selector is a metacharacter that selects one
of a set of semantically equivalent glyphs.  

I no doubt do not fully understand (even after scanning the Unicode standard,
http://unicode.org/reports/tr28/tr28-3.html#13_7_variation_selectors and this:
http://babelstone.blogspot.com/2007/06/secret-life-of-variation-selectors.html
about it).  I can, however, see the difference variation selectors can make,
e.g. here: http://www.w3.org/TR/xml-entity-names/U0FE00.html.

But why are the "VARIATION SELECTOR-n" included as CHAR-NAMEs in `ucs-names'?
IIUC, variant selectors, when used, follow characters whose
representations/appearance they modify in some sense.

Why do we treat variation selectors, in `ucs-names', as "character names", if
they are only "metacharacters", "combining marks" used to indicate how to change
the appearance of the characters they follow?

I see that the Unicode standard also refers to variation selectors as "default
ignorable characters", so I guess they are characters in some sense.

But how about providing a function that filters out all such "ignorable
characters" from `ucs-names', or how about at least providing a list of all such
chars.

I see this in the standard too: "If a user requires a visual distinction between
a character and a particular variant of that character, then fonts must be used
to make that distinction."

The "variation selector" information seems to be only about visual appearance,
not about names of displayable characters.  Does it really belong in
`ucs-names'?

And I see that such "ignorable" stuff is apparently supposed to be invisible -
e.g., "default_ignorable_code points...are invisible, have no glyph...".  If so,
how about a function that filters out all such invisible stuff from `ucs-names'
(or at least a list of such stuff).

How about a little more doc for `ucs-names', so that any programmer who might
want to use `ucs-names' (e.g. for completion) might know how to reasonably
use/deal with such  particular CHAR-NAMEs.  Please do not simply say that
`ucs-names' is only "internal" so you need not describe it better.  It's already
being used in various 3rd-party code.

Again, this is not really part of this bug report (which is only about "" as a
CHAR-NAME), unless you see that it is related (e.g. wrt doc).  But I would like
to know more about the "ignorable characters" - how to recognize them etc. so
that I can (optionally, at least) remove them as completion candidates.

I understand that the Emacs doc does not have as its purpose to teach the
details of the Unicode standard, but perhaps a little more explanation of the
content of `ucs-names' wouldn't hurt?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]