emacs-bidi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[emacs-bidi] Re: Supporting non-plain-text buffers


From: Eli Zaretskii
Subject: [emacs-bidi] Re: Supporting non-plain-text buffers
Date: Thu, 15 Jul 2010 16:00:02 +0300

> Date: Thu, 15 Jul 2010 19:49:23 +0900
> From: "Martin J. Dürst" <address@hidden>
> CC: address@hidden
> 
> Sorry to be late with my reply.

We are all busy people.

> > I don't see any situation that RLE/LRE or RLO/LRO, as part of the
> > display string itself, won't be able to handle.  Do you?
> 
> It depends on where we allow the corresponding PDFs to go. (a) do the 
> PDFs need to be in the same piece of text (or, if a PDF is missing, do 
> we just close the embedding anyway at the end of that piece of text), or 
> (b) are embeddings (and overrides) allowed to span several of these text 
> pieces?
> 
> If you mean (b), then we should most probably be covered. If you mean 
> (a), I'm not so sure about it.

I meant (a).  Anything else can be handled by providing the initial
value for the base embedding level, as part of the property, I think.

> Also, in several programming languages, there is string interpolation. 
> This means that in the middle of a (let's assume RTL) string, one can go 
> back to code. And then of course in the middle of that code, one can go 
> back to strings. And string interpolation can also be used in regexps.
> 
> And then there is also the whole area of PHP, JSP, ASP,... where you 
> have by definition program code in the middle of (Web page) text, and of 
> course that program code can contain text again.

Something that a clever enough parser couldn't parse and set the
properties accordingly?

> This also applies to XML/HTML. Let's take the following example from TR 9:
> logical, with some LRE/RLE/PDF: DID YOU SAY ’he said “car MEANS CAR”‘?
> With HTML markup:
> <p lang='he' dir='rtl'>DID YOU SAY ’<span lang='en' dir='ltr'>he said 
> “<span lang='he' dir='rtl'><span lang='en'>car</span> MEANS 
> CAR</span>”</span>‘?</p>
> 
> To take just the innermost part here, would an user want to see
>     <span lang='en'>car</span> RAC SNAEM
> or would she like to see
>     RAC SNAEM <span lang='en'>car</span>
> or would she like to see
>     RAC SNAEM <span/>car<lang='en' span>
> which looks confusing, but maybe not so much if the element name is in 
> RTL, too, which would then give something like
>     RAC SNAEM <NAPS/>car<lang='en' NAPS>

How is this actually displayed by a browser, i.e. when the markup is
removed?  That's how we should display it with the markup as well.
IOW, according to the markup rules.

> I don't really mind too much which way we go, but given that I must 
> assume that the bidi algorithm has hierarchically nested embeddings for 
> a reason

It has them for a reason, but that reason is to allow programs to
produce text where these embeddings are already present by means of
the formatting control characters.  If these formatting characters are
not there in the original text, we are not allowed to add them.

> > We probably won't want to change the bidi
> > properties of a character for the entire buffer (because it could be
> > used elsewhere in the buffer, like in a comment, where we would want
> > it to be reordered normally).  So this means we would need to use
> > different tables of bidi properties for different portions of the
> > text.  Switching bidi properties during display, as it walks the
> > buffer, is doable, but is somewhat tricky and can raise some hard
> > problems.
> 
> The table lookup might be done beforehand, with Font lock or some 
> similar mechanism, and the result may be carried in properties.

You seem to be thinking about performance; that's not the issue.  The
issue is that the reordering engine was written under the assumption
that certain information remains static during reordering of a single
level run.  If this assumption is violated, I don't know what will
happen.  I never analyzed such a possibility.

> > Covering each string, excluding its quotes, with a special text
> > property, and the same with a comment (excluding the comment
> > start/end) sounds a simpler solution.
> 
> This works very nicely if there is no nesting. If you can tell me for 
> sure that nobody working with Perl, Ruby, PHP, JSP, ASP, HTML, XML,... 
> will prefer nested bidi reordering for some cases, that might solve the 
> problem. But I wouldn't want to make such an assertion.

I'm willing to assume that for the time being, until someone comes and
shows a use-case where this is a limitation.  My experience with
Hebrew speaking programmers is that they avoid such mixups precisely
because they are not handled well by existing development tools.
Let's leave something for future extensions ;-)




reply via email to

[Prev in Thread] Current Thread [Next in Thread]