emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [emacs-bidi] Re: improving bidi documents display


From: Martin J. Dürst
Subject: Re: [emacs-bidi] Re: improving bidi documents display
Date: Sun, 27 Feb 2011 19:34:22 +0900
User-agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.9) Gecko/20100722 Eudora/3.0.4

Hello Michael,

I and my students have been working on this problem, in the context of XML/HTML, on and off for quite a few years. Please have a look at some of the following:

http://www.sw.it.aoyama.ac.jp/2005/pub/IUC28-bidi/IUC28.html
http://www.sw.it.aoyama.ac.jp/2005/pub/IUC28-bidi/
http://www.sw.it.aoyama.ac.jp/2008/pub/IUC32-bidi/

For the last year, Shunsuke Oshima, a student of mine, has been working on an implementation for Emacs in EmacsLisp. We hope to be able to publish the code in the next few weeks. It seems that the problems with LaTeX are very much similar to those with XML/HTML, and it should be possible to adapt our code to LaTeX.

Our implementation is currently actually two parallel implementations, one based on the insertion of additional control characters (it's a pain to get rid of them before all save/copy/cut and similar operations), and one based on overlays, which is what was originally suggested for this purpose by Ken'ichi Handa, but is currently not working because the characters in overlays don't participate in the bidi algorithm (Eli thinks that would make things too slow).

Regards,   Martin.

On 2011/02/27 19:01, Michael Welsh Duggan wrote:
Eli Zaretskii<address@hidden>  writes:

Date: Thu, 24 Feb 2011 14:32:35 +0200
From: Eli Osherovich<address@hidden>

At the moment (using rev. 103371)  I can edit Hebrew/English LaTeX
documents, however, the way they are displayed in Emacs is not perfect.
Please look at the file attached as you can see any English text that
appears inside a Hebrew paragraph requires certain decorations around it
(e.g., \L{some English text}) these decorations are displayed in an ugly
fashion.

Yes, it's a known problem.  The Unicode UAX#9 Bidirectional algorithm
(which is what Emacs implements for bidirectional display) does not
produce good results with LaTeX (and with other kinds of markup).

Is there anything that can be done about it?

Something _should_ be done, for sure.  But for that, Someone™ should
figure out how this kind of problems could be solved using Emacs
display features.  Any solution will probably involve reordering only
parts of text, but a more detailed design suggestion is needed before
it can be implemented.  People are welcome to try to tackle this,
because I'm still busy with low-level bidi support of plain text.

I'd like to talk about this problem a little, just to get a little
understanding of the problem space.  Please be warned that although I
have read through UAX#9 a few times, and have been following (as best I
can) Eli's bidi work, I am still very much a novice, and am apt to make
improper assumptions, or misunderstand how things are supposed to work.

In the examples, below, I will use the convention in the UAX#9
document that a capital letter represents an R type character, and a
lower-case letter represents an L type character.  Formatting codes will
be typed as<RLE>,<PDF>, etc.

So, the example being used was:

Memory:  HEBREW \foo{english}
Levels:  11111111222222222221
Display: {foo{english\ WERBEH

Here the paragraph embedding level is 1 (odd, LtR) since the first
character is an R character.  The backslash, braces, and spaces are N
characters.  The N character sequence " \" takes on the current
embedding direction (1) based on rule N2.  The open brace gets level 2
based on rule N1, and the close brace gets level 1 again based on rule
N2.  Note that the close brace appears as its mirrored glyph due to rule
L4).

(Rule N1 states that runs of neutral characters between strong
characters of the same direction take on that direction.  Rule N2 states
that otherwise, they get the embedding direction.)

Here is another example:

Memory:  HEBREW \foo{HEBREW}
Levels:  1111111122211111111
Display: {WERBEH}foo\ WERBEH

In this case, note that both of the braces are mirrored in the display.

One simple, naive way of handling this for the various TeXs is to
consider all backslashes and brace characters as R characters.  This can
be simulated by surrounding each run of these characters by LRE PDF
pairs.  However, unless TeX ignores these characters completely, these
formatting characters would have to be removed before being processed by
TeX.

Another way of handling this would be to redefine the backslash and
brace characters as R characters, for purposes of the display engine.
Currently, I don't know if there is a way to do this in elisp.  bidi.c
seems to use a character table named bidi_type_table to hold this
information.  Currently this table is not exposed at the elisp layer, to
the best of my knowledge.  Maybe it would be possible to modify this
table in elisp, and possibly make it buffer local?

Another idea would be to allow a text property to override the character
type.  This feels like a very elegant, emacs-ish way to do things, but
an uneducated glance at the bidi code makes me feel like it would be
difficult to get information about text properties into this layer.
Another idea would be to use display strings including the LRE and PDF
characters to replace existing backslashes and braces.  However, display
strings do not affect the bidi algorithm at this point.

I'm really starting to ramble at this point, so I think I will send
these musings to see what Eli and others think.


--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:address@hidden



reply via email to

[Prev in Thread] Current Thread [Next in Thread]