emacs-bidi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[emacs-bidi] Re: Supporting non-plain-text buffers


From: Martin J. Dürst
Subject: [emacs-bidi] Re: Supporting non-plain-text buffers
Date: Tue, 06 Jul 2010 16:18:17 +0900
User-agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.4pre) Gecko/20091214 Eudora/3.0b4

Hello Eli, others,

Sorry for being late in replying.

On 2010/07/02 19:38, Eli Zaretskii wrote:
Date: Fri, 02 Jul 2010 10:04:35 +0900
From: "Martin J. Dürst"<address@hidden>
CC: address@hidden, address@hidden

Note that I've changed the Subject line.  It's time.

Thanks!

One thing that we should think about is what people want to happen if
there is actual displayable text in some of these strings. I don't have
much of an idea where this is used, but I can imagine that at least in
some usage scenarios, one might want the text added via an overlay to be
rendered in exactly the same way as the text in the buffer.

Can you explain what do you mean by the last sentence?  Perhaps an
example will clarify that.

Well, let's assume that there is some arcane file format with settings, and there is some Emacs lisp that adds additional text with overlays to make it easier to understand the format. I'm sure there are other use cases for such strings, otherwise, why would there be before-string and after-string properties for overlays. Anyway, if there are both RTL and LTR characters in one of these properties, these texts also need bidi treatment. Even if there's only RTL, it has to be reordered for display. Also, in some cases, the texts in the overlay properties may form units that are best treated as embeddings (or similar), but in other cases, they may better be treated as part of the overall text, and that overall text should be processed with the bidi algorithm.


I think having a special text property that covers the text
that needs to be reordered is a cleaner solution.

It's definitely also a viable solution, although there also might be
some tricky issues. Say you have a property defining an embedding from
characters 10 to 30, and another such property from characters 20 to 40.
What exactly is that supposed to mean?

This cannot happen in Emacs, because each property can have only one
value for each character.  In effect, ranges of buffer positions of the
same text property cannot overlap.

I see. But then that would make it rather difficult to define embeddings, wouldn't it, because you have to include the number of current embeddings and their orientation in the property. E.g. something like (the characters a-g are just so that there's something between the formatting codes):

a RLE b LRE c RLE d POP e POP f POP g

would translate into (writing each character on a separate line)

a
b RLE
c RLE LRE
d RLE LRE RLE
e RLE LRE
f RLE
g

Unless you add quite a bit of intermediate library code, this will be rather inconvenient to handle for an end user.


In any case, if this were possible, it would first and foremost have
to be solved for the unidirectional display.

You mean overlapping properties? In that case, I agree. But if properties cannot overlap, maybe we should use overlays. As far as I understand, they can overlap.

I don't intend to change the bidi reordering engine in any significant
way, to support these features.  All that's needed is a possibility to
tell it "restrict yourself to region between buffer positions P1 and
P2".  Actually, it just descended on me that I can easily do that with
`narrow-to-region', since the reordering engine already honors that,
it never goes out of the accessible portion of text.

I'm not sure I understand, but if it means that the bidi algorithm is
just applied piecewise, that won't be enough. It may be enough for some
simple cases, such as C programs, where the main concern is to keep text
within string constants together, and the rest is ASCII only and
therefore goes LTR. However, on the other hand, with some XML markup
with e.g. element and attribute names in Hebrew, in our experience
actual nestings (i.e. embeddings in terms of the bidi algorithm) are
highly desirable.

Again, an example would go a long way towards explaining what you
mean.  In general, what I wrote does not eliminate the possibility
that embeddings might be used within the reordered parts, nor that the
text outside of the markup is LTR only.

Okay. In the prototype and in the Web-based editor that we have worked on to display HTML, we typically used embeddings for: - Elements (incl. start tag and end tag) that have a dir attribute (which indicates an embedding in the Web page view). These can of course be nested.
- Start tags (and end tags)
- Attribute/attribute value combinations

Not all of these may be necessary in all cases, but it would be too complicated to try and figure exactly which ones might be left out in any particular case, and even this wouldn't eliminate the need for nested embeddings. And it is at least currently unclear to me how you could achieve nested embeddings with a possibility to tell the rendering engine "restrict yourself to this region".


I just meant to say that, technically, reordering of just a portion of
text can be achieved by temporarily narrowing the buffer to that
portion, for as long as the display engine is processing that portion.

Yes, if reordering of only a portion of text is sufficient to address some problem, then this will be enough.

I think there are also other ways of attacking the problem. What about,
for example, a property on characters that increases the embedding level
in a certain way?

This idea was actually discussed some 10 years ago, as one of the
possible means of maintaining the reordering information as part of
the buffer.  It was rejected because, as I explained above, text
properties cannot overlap, so maintaining this information would be a
pain when the buffer is edited: you would need to split and join
properties' ranges when embedding format codes are added or deleted.

Or a property that changes the bidi category of a character?

This can be done if we need it, but I still don't see use-cases that
would benefit from such a feature.

Making the characters that define XML syntax, such as <, >, ", ', =,... strong LTR would solve a lot (but not all) of the display anomalies for XML (incl. HTML).

It might solve all display anomalies for programming languages like C to define " (for strings) and comment start/end as LTR (at least as long as there are no RTL identifiers).

Regards,    Martin.


--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:address@hidden



reply via email to

[Prev in Thread] Current Thread [Next in Thread]