Re: [emacs-bidi] Where do I start?

emacs-bidi

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [emacs-bidi] Where do I start?

From:	Eli Zaretskii
Subject:	Re: [emacs-bidi] Where do I start?
Date:	Wed, 07 Nov 2001 19:04:04 +0200

> From: Alex Schroeder <address@hidden>
> Date: Wed, 07 Nov 2001 16:04:19 +0100
> 
> Eli Zaretskii <address@hidden> writes:
> 
> > IIRC, Ehud's hebeng.el has a function for something like that, so
> > perhaps most (or even all) of the job is already done.  But that's
> > part of the research.
> 
> What I found is the following (and related functions).
> 
> ELISP> (winvert-string

I'm not sure this is the one, but I'm sure Ehud will tell ;-)

> Some more magic would be required to add the right spaces.  The simple
> solutions of replacing "\n" with "\n " or " \n" don't work because
> that results in either two or no spaces.

Sorry, I don't see the problem.  If we forget that winvert-list
exists, do you see any special problems with handling a newline?

> Related question: The classification of characters takes place in the
> following lookup function:
> 
> (defun get-bidi-type (char)
>   "Return the bidi type of the given CHAR.
> It may be A, B, D, I, L, N, R, space or -.
> See help for `hebrew-english-bidi-type'."
[...]
> This should probably be extended.

This is not the UAX#9 classification.  I think you should try to work
with what UAX#9 defines, and only add more classes if needed.

The data structure to hold this should probably be a char-table of
some kind, since a string that Ehud used is not an efficient storage
for large sparse arrays.  It is good for a small contiguous set of
characters, such as Hebrew, but doesn't scale up well if you add
Arabic and other bidi scripts.

> Where can I find a list of mule characters? 

Mule simply takes an iso8859-x set (x=8 for Hebrew, 6 for Arabic) and
adds a constant offset.  So to get a map of Hebrew Mule characters,
you need a full iso8859-8 list.  You can find one here:

  http://www.qsm.co.il/Hebrew/ab.htm

This site includes Hebrew diacriticals and even directional format
codes (RLO etc.).  To get the Mule codepoints, add the result of
(- (make-char 'hebrew-iso8859-8) 32) to the iso8859-8 code.

> Do you think that the classification provided by get-bidi-type goes
> far enough?

I think we should be gin with what UAX#9 defines.

> What table did you use for your algorithm -- did you just
> fake it, ie. use a very small table for testing purposes?

The tables to hold this information were not fully designed yet.  For
now, I use a C switch statement, which is sufficient for testing
purposes.

I think the data structure to hold this, at least on the Lisp level,
will be some kind of char-table.  (It's possible that, for
efficiency, I will write code to process the char-table into a
bitmapped array, of the kind the standard C ctype functions, such as
`isalpha' and `ispunct', use.)

Since you are doing this in Lisp, I think a char-table should be good
enough for now.

> Using capitalization would have been a good idea to test you code --
> you could have taken the test cases verbatim from the report.

That's what I did.  But most of my test cases are from the FriBidi
distribution's test suite.  Some others I added as I found bugs and
debugged them.

> >   - It would be nice if converting from logical to visual and then
> >     back would be as close to the original as possible.  I think you
> >     should be able to reproduce the original exactly if it contains no
> >     explicit formatting codes; otherwise, you can't.
> 
> I agree.  Just to make sure I understand you correctly: the visual
> format never includes any directional formatting codes, right?

Yes, the visual-order text has no directional formatting codes.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [emacs-bidi] Where do I start?, (continued)

Prev by Date: Re: [emacs-bidi] Arabic Mule
Next by Date: Re: [emacs-bidi] Arabic Mule
Previous by thread: Re: [emacs-bidi] Arabic Mule
Next by thread: Re: [emacs-bidi] diacritics, ligatures, etc.
Index(es):
- Date
- Thread