emacs-bidi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [emacs-bidi] Where do I start?


From: Alex Schroeder
Subject: Re: [emacs-bidi] Where do I start?
Date: Wed, 07 Nov 2001 16:04:19 +0100
User-agent: Gnus/5.090004 (Oort Gnus v0.04) Emacs/21.1 (i686-pc-linux-gnu)

Heh, "progress report"?

Eli Zaretskii <address@hidden> writes:

> IIRC, Ehud's hebeng.el has a function for something like that, so
> perhaps most (or even all) of the job is already done.  But that's
> part of the research.

What I found is the following (and related functions).

ELISP> (winvert-string "foo and אבגדהוזחטיךכלםמןנסעףפץצקרשת and foo")
"foo and תשרקצץפףעסנןמםלכךיטחזוהדגבא and foo"

>   - Ideally, the code will accept a paragraph of text (which can span
>     several lines), and produce a reordered paragraph.  But in
>     practice, it's possible that only a small part of the text is
>     passed (think about reading output of an async subprocess).  I
>     don't know what to do about this case; perhaps nothing for now.

The following is available.  As you can see, though, it doesn't work
too well with paragraphs, since the result is concatenated and the
newlines are lost.

ELISP> (winvert-list (split-string "there is אבגדהוזחטיךכלם\nand foo and 
מןנסעףפץצקרשת and\nfoo" "\n"))
"there is םלכךיטחזוהדגבאand תשרקצץפףעסנןמ and foo andfoo"

Some more magic would be required to add the right spaces.  The simple
solutions of replacing "\n" with "\n " or " \n" don't work because
that results in either two or no spaces.

Anyway, I hope Ehud will speak up if this is on the wrong track.

I will proceed along this line.

Related question: The classification of characters takes place in the
following lookup function:

(defun get-bidi-type (char)
  "Return the bidi type of the given CHAR.
It may be A, B, D, I, L, N, R, space or -.
See help for `hebrew-english-bidi-type'."
       (if (< char 256)
           (aref hebrew-english-bidi-type char)    ;; normal 8 bit char
           (if (or (< char ?\xC00)                 ;; Hebrew MULE start
                   (> char ?\xC7F))                ;; Hebrew MULE end
               ?L                                  ;; Not Hebrew, Assume Latin 
(any kind)
               (aref hebrew-english-bidi-type (- char ?\xB80)))))  ;; Hebrew 
Range C60 -> E0

Where:

hebrew-english-bidi-type
 => "--------- ---------------------- 
AANNNBAIIANIBIBDDDDDDDDDDA-IAI--LLLLLLLLLLLLLLLLLLLLLLLLLL---IBALLLLLLLLLLLLLLLLLLLLLLLLLL---A-RRRRRRRRRRRRRRRRRRRRRRRRRRR-----A---------------------------------------------------------------RRRRRRRRRRRRRRRRRRRRRRRRRRR--LR-"

This should probably be extended.  Where can I find a list of mule
characters?  We could then add all the characters specified in UAX#9
to this classification.

        +----+-------------+--------------------------------------------------+
Strong  |R   |Right-to-Left|RLM, Hebrew alphabet, most punctuation specific to|
        |    |             |that script, all unassigned characters in the     |
        |    |             |ranges (0590-05FF, FB1D-FB4F)                     |
        +----+-------------+--------------------------------------------------+
        |AL  |Right-to-Left|Arabic, Thaana, and Syriac alphabets, most        |
        |    |Arabic       |punctuation specific to those scripts, all        |
        |    |             |unassigned characters in the ranges (0600-07BF,   |
        |    |             |FB50-FDFF, FE70-FEFF)                             |
        +----+-------------+--------------------------------------------------+

Do you think that the classification provided by get-bidi-type goes
far enough?  What table did you use for your algorithm -- did you just
fake it, ie. use a very small table for testing purposes?  Using
capitalization would have been a good idea to test you code -- you
could have taken the test cases verbatim from the report.  Anyway,
extending this table would be one thing.

>   - The paragraph could be left-justified (a mostly left-to-right text
>     with some right-to-left characters embedded), or right-justified.
>     In the latter case, you need to remove any padding blanks on the
>     left, as part of the conversion.

That should be trivial.

>   - It would be nice if the code included detection of visual-order
>     bidi text, but it's not imperative.

I'll ask about that in another posting.

>   - It would be nice if converting from logical to visual and then
>     back would be as close to the original as possible.  I think you
>     should be able to reproduce the original exactly if it contains no
>     explicit formatting codes; otherwise, you can't.

I agree.  Just to make sure I understand you correctly: the visual
format never includes any directional formatting codes, right?

Alex.
-- 
http://www.emacswiki.org/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]