emacs-bidi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[emacs-bidi] Re: Supporting non-plain-text buffers


From: Eli Zaretskii
Subject: [emacs-bidi] Re: Supporting non-plain-text buffers
Date: Wed, 07 Jul 2010 13:59:03 +0300

> Date: Tue, 06 Jul 2010 16:18:17 +0900
> From: "Martin J. Dürst" <address@hidden>
> CC: address@hidden
> 
> >> One thing that we should think about is what people want to happen if
> >> there is actual displayable text in some of these strings. I don't have
> >> much of an idea where this is used, but I can imagine that at least in
> >> some usage scenarios, one might want the text added via an overlay to be
> >> rendered in exactly the same way as the text in the buffer.
> >
> > Can you explain what do you mean by the last sentence?  Perhaps an
> > example will clarify that.
> 
> Well, let's assume that there is some arcane file format with settings, 
> and there is some Emacs lisp that adds additional text with overlays to 
> make it easier to understand the format. I'm sure there are other use 
> cases for such strings, otherwise, why would there be before-string and 
> after-string properties for overlays. Anyway, if there are both RTL and 
> LTR characters in one of these properties, these texts also need bidi 
> treatment. Even if there's only RTL, it has to be reordered for display. 

There's no argument that text in display strings should be reordered.
I just didn't yet write code to handle that, but it's on my todo.

> Also, in some cases, the texts in the overlay properties may form units 
> that are best treated as embeddings (or similar), but in other cases, 
> they may better be treated as part of the overall text, and that overall 
> text should be processed with the bidi algorithm.

I don't see any situation that RLE/LRE or RLO/LRO, as part of the
display string itself, won't be able to handle.  Do you?

> >>> I think having a special text property that covers the text
> >>> that needs to be reordered is a cleaner solution.
> >>
> >> It's definitely also a viable solution, although there also might be
> >> some tricky issues. Say you have a property defining an embedding from
> >> characters 10 to 30, and another such property from characters 20 to 40.
> >> What exactly is that supposed to mean?
> >
> > This cannot happen in Emacs, because each property can have only one
> > value for each character.  In effect, ranges of buffer positions of the
> > same text property cannot overlap.
> 
> I see. But then that would make it rather difficult to define 
> embeddings, wouldn't it, because you have to include the number of 
> current embeddings and their orientation in the property.

We may need to specify the base paragraph direction for each such
portion of buffer text, yes.  But that is all; I don't see why we
would need to specify embedding level -- this can be handled with the
existing characters, RLE, RLO, etc.

IOW, what I thought about was that most of the text would be not
reordered (which is okay, since outside strings and comments, the rest
is strict L2R, mostly even 7-bit ASCII, text).  Only the portions that
have the special property on them will be reordered, and that
reordering will be according to the normal UAX#9 rules.  I still don't
see which use-cases will need something more than this.  And I mean
specific practical use-cases, not hypothetical ones.

> E.g.  something like (the characters a-g are just so that there's
> something between the formatting codes):
> 
> a RLE b LRE c RLE d POP e POP f POP g
> 
> would translate into (writing each character on a separate line)
> 
> a
> b RLE
> c RLE LRE
> d RLE LRE RLE
> e RLE LRE
> f RLE
> g
> 
> Unless you add quite a bit of intermediate library code, this will be 
> rather inconvenient to handle for an end user.

I don't understand why this would be needed.  Could you please present
a detailed example where this is needed?

> You mean overlapping properties? In that case, I agree. But if 
> properties cannot overlap, maybe we should use overlays. As far as I 
> understand, they can overlap.

Overlays don't scale up well; having lots of them in a buffer slows
down redisplay to an annoyingly low speed.  So I'd rather we didn't,
if we can find another solution.  Again, I still don't see why we
would need this one, and what problems it is supposed to solve.

> >> I'm not sure I understand, but if it means that the bidi algorithm is
> >> just applied piecewise, that won't be enough. It may be enough for some
> >> simple cases, such as C programs, where the main concern is to keep text
> >> within string constants together, and the rest is ASCII only and
> >> therefore goes LTR. However, on the other hand, with some XML markup
> >> with e.g. element and attribute names in Hebrew, in our experience
> >> actual nestings (i.e. embeddings in terms of the bidi algorithm) are
> >> highly desirable.
> >
> > Again, an example would go a long way towards explaining what you
> > mean.  In general, what I wrote does not eliminate the possibility
> > that embeddings might be used within the reordered parts, nor that the
> > text outside of the markup is LTR only.
> 
> Okay. In the prototype and in the Web-based editor that we have worked 
> on to display HTML, we typically used embeddings for:
> - Elements (incl. start tag and end tag) that have a dir attribute 
> (which indicates an embedding in the Web page view). These can of course 
> be nested.
> - Start tags (and end tags)
> - Attribute/attribute value combinations
> 
> Not all of these may be necessary in all cases, but it would be too 
> complicated to try and figure exactly which ones might be left out in 
> any particular case, and even this wouldn't eliminate the need for 
> nested embeddings. And it is at least currently unclear to me how you 
> could achieve nested embeddings with a possibility to tell the rendering 
> engine "restrict yourself to this region".

Please show an actual fragment of HTML/XML which needs nesting or
embeddings.

> >> Or a property that changes the bidi category of a character?
> >
> > This can be done if we need it, but I still don't see use-cases that
> > would benefit from such a feature.
> 
> Making the characters that define XML syntax, such as <, >, ", ', =,... 
> strong LTR would solve a lot (but not all) of the display anomalies for 
> XML (incl. HTML).

If it doesn't solve all the problems, I'd rather try first to find a
solution that does.  We probably won't want to change the bidi
properties of a character for the entire buffer (because it could be
used elsewhere in the buffer, like in a comment, where we would want
it to be reordered normally).  So this means we would need to use
different tables of bidi properties for different portions of the
text.  Switching bidi properties during display, as it walks the
buffer, is doable, but is somewhat tricky and can raise some hard
problems.  The fact that it is not a comprehensive solution makes me
even more reluctant to use it.

> It might solve all display anomalies for programming languages like C to 
> define " (for strings) and comment start/end as LTR (at least as long as 
> there are no RTL identifiers).

But quotes can appear in the comments as well, so I think here, too,
we won't be able to use the same properties for the entire buffer.

Covering each string, excluding its quotes, with a special text
property, and the same with a comment (excluding the comment
start/end) sounds a simpler solution.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]