emacs-bidi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [emacs-bidi] bidi categories, derived from Unicode data


From: Alex Schroeder
Subject: Re: [emacs-bidi] bidi categories, derived from Unicode data
Date: Sat, 10 Nov 2001 01:35:27 +0100
User-agent: Gnus/5.090004 (Oort Gnus v0.04) Emacs/21.1 (i686-pc-linux-gnu)

"Eli Zaretskii" <address@hidden> writes:

> Here's how:
> 
>   (decode-char 'ucs uchar)
> 
> where UCHAR is the Unicode codepoint.  Easy, eh?

Hehe, just what I wanted.

>> -- or better yet, how do I get all characters from all the other
>> charsets matching it?
> 
> For this, you will need tables, there's no single method.

I've used the tables Dave Love had in ucs-tables.el.  This means that
my table now holds the UAX#9 categories for all UCS characters as well
as for all 8859 characters.  All the other charsets remain untouched.

Unfortunately the lisp files required to set this up are rather big,
we'll have to find a way of dumping the info, later.  Even worse if we
want to add bidi categories to all the asian charsets...

Attached you will find bidi.el which does the setup.  Here's a short
textual description for those who will not read the code.  It creates
a variable for every UAX#9 bidi type and gets the necessary number of
unused categories.  Every category is identified by a character.  This
character is stored in the respective variable.  (This is a workaround
because I don't want to fix the categories, yet.  It will be removed,
later.)

bidi-table.el is equipped with several tables.  One of the tables maps
UCS characters to bidi type (actually, to the symbol which holds the
"real" category character), and several mapping tables from UCS to ISO
8859 charsets, provided by Dave Love (from his ucs-tables.el).  Using
this information, the code in bidi-table.el will assign the bidi
categories as specified by the UnicodeData.txt file from unicode.org
to all UCS and to all 8859 characters.

Some UCS characters seem not to exist in Emacs; this was surprising.
An example from the source: (decode-char 'ucs ?\x33FE) -- valid,
(decode-char 'ucs ?\x3400) -- invalid.  I don't know what to make of
it.  I currently just ignore the UCS characters where decode-char
returns nil.

Alex.
-- 
http://www.emacswiki.org/

Attachment: bidi-table.el
Description: application/emacs-lisp

Attachment: bidi.el
Description: application/emacs-lisp


reply via email to

[Prev in Thread] Current Thread [Next in Thread]