emacs-bidi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [emacs-bidi] bidi categories


From: Alex Schroeder
Subject: Re: [emacs-bidi] bidi categories
Date: Fri, 09 Nov 2001 19:28:14 +0100
User-agent: Gnus/5.090004 (Oort Gnus v0.04) Emacs/21.1 (i686-pc-linux-gnu)

Alex Schroeder <address@hidden> writes:

> "Eli Zaretskii" <address@hidden> writes:
> 
>> I'd prefer the classification defined by UAX#9.  It would be
>> confusing, I think, to have 2 different classifications.
> 
> Yes, I agree.  I wrote the above code before reading your comments.  I
> will send a new version asap.  :)

Here's what I have at the moment.  As I said in another mail, I also
have the Unicode classification, but I don't know yet how to use it --
given a Unicode codepoint (correct term?), how do I get the
corresponding Mule character in the unicode charsets -- or better yet,
how do I get all characters from all the other charsets matching it?
Perhaps I can use the Tables Dave Love has sent to gnu.emacs.sources;
I will investigate.

I looked at UAX#9 again, and while it describes the process of
transforming logical to visual order, it a) looks horribly complicated
and b) not reversible.  I guess that's old stuff for those who are
familiar with it...  :) Since the reverse operation is not defined
explicitly (did I miss it?), a certain amount of guess-work will be
needed.

Alex.


;;; L/R categories.

;; This is modelled after characters.el.  At the moment, however, we
;; don't have categories assigned, so we must create them ourselves.
;; The new categories are identified by a character, like all other
;; categories.  We store them in the following variables.

;; The existing categories and syntax tables are not enough to resolve
;; bidi issues: Some of these categories specify that the "real"
;; category must be determined from context.  See the Unicode Standard
;; Annex #9, available from http://www.unicode.org/unicode/reports/tr9/.

;; Note that not all categories mentioned in UAX#9 are listed -- perhaps
;; they will be added later.

(defvar bidi-category-l nil
  "Strong Left-to-Right: Most alphabetic, syllabic, Han ideographic
characters, digits that are neither European nor Arabic, all unassigned
characters except in the ranges (0590-05FF, FB1D-FB4F) and (0600-07BF,
FB50-FDFF, FE70-FEFF).")
(defvar bidi-category-r nil
  "Strong Right-to-Left: Hebrew alphabet, most punctuation specific to
that script, all unassigned characters in the ranges (0590-05FF,
FB1D-FB4F)")
(defvar bidi-category-al nil
  "Strong Right-to-Left Arabic: Arabic, Thaana, and Syriac alphabets,
most punctuation specific to those scripts, all unassigned characters in
the ranges (0600-07BF, FB50-FDFF, FE70-FEFF).")
(defvar bidi-category-en nil
  "Weak European Number: European digits, Eastern Arabic-Indic digits.")
(defvar bidi-category-es nil
  "Weak European Number Separator: Solidus (Slash).")
(defvar bidi-category-et nil
  "Weak European Number Terminator: Plus Sign, Minus Sign, Degree,
Currency symbols.")
(defvar bidi-category-an nil
  "Weak Arabic Number: Arabic-Indic digits, Arabic decimal & thousands
separators.")
(defvar bidi-category-cs nil
  "Weak Common Number Separator: Colon, Comma, Full Stop (Period),
Non-breaking space.")
(defvar bidi-category-s nil
  "Neutral Segment Separator: Tab.")
(defvar bidi-category-ws nil
  "Neutral Whitespace: Space, Figure Space, Line Separator, Form Feed,
General Punctuation Spaces")
(defvar bidi-categories
  '(bidi-category-l 
    bidi-category-r 
    bidi-category-al
    bidi-category-en
    bidi-category-es
    bidi-category-et
    bidi-category-an
    bidi-category-cs
    bidi-category-s 
    bidi-category-ws)
  "List of categories used by bidi algorithms.")

(defun bidi-setup-categories ()
  "Create new categories for bidi according to UAX#9."
  ;; (setq table (standard-category-table))
  (let ((table (standard-category-table)))
    (mapcar (lambda (var)
              (let ((cat (get-unused-category table))
                    (doc (get var 'variable-documentation)))
                (when (symbol-value var)
                  (error "%S is already set" var))
                (unless cat
                  (error "No more unused categories available"))
                (set var cat)
                (define-category cat doc table)))
            bidi-categories)
    ;; ASCII: there are no characters of the categories R, AL and AN.
    ;; Lots of characters are still missing a classification, this will
    ;; be fixed using the Unicode tables.
    (mapcar (lambda (char)
              (modify-category-entry char bidi-category-l table t))
            "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz")
    (mapcar (lambda (char)
              (modify-category-entry char bidi-category-en table))
            "0123456789")
    (mapcar (lambda (char)
              (modify-category-entry char bidi-category-es table))
            "/")
    (mapcar (lambda (char)
              (modify-category-entry char bidi-category-et table))
            "+-$")
    (mapcar (lambda (char)
              (modify-category-entry char bidi-category-cs table))
            ":;,.")
    (mapcar (lambda (char)
              (modify-category-entry char bidi-category-s table))
            "\t")
    (mapcar (lambda (char)
              (modify-category-entry char bidi-category-ws table))
            " \n\r\f")
    ;; Hebrew character set (ISO-8859-8).  Only some characters in this
    ;; character set are written left-to-right.
    (mapcar (lambda (char)
              (modify-category-entry char bidi-category-r table))
            "אבגדהוזחטיךכלםמןנסעףפץצקרשת")))



reply via email to

[Prev in Thread] Current Thread [Next in Thread]