emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bidirectional text and URLs


From: Eli Zaretskii
Subject: Re: Bidirectional text and URLs
Date: Sun, 30 Nov 2014 19:53:32 +0200

> From: Lars Magne Ingebrigtsen <address@hidden>
> Date: Sun, 30 Nov 2014 17:26:33 +0100
> 
> Just a point of clarification: When people embed URLs in paragraphs with
> mainly right-to-left script (like Hebrew)

Let's clear up terminology first, OK?

There's no distinction in bidi display and bidi scripts between
"paragraphs with mainly right-to-left scripts" and "paragraphs with
mainly left-to-right scripts".  Instead, there's "the base direction
of a paragraph", which can be either left-to-right (LTR) or
right-to-left (RTL).  The former is displayed with the first character
(in the _visual_ order!) at the left edge of the window, while the
latter at the right edge.

It is true that the LTR paragraphs make most sense when most of the
paragraph text is made of LTR characters, and the RTL paragraphs in
the opposite case.  But nothing prevents me from having a paragraph
whose base direction is LTR which is nevertheless full of RTL
characters.  It is entirely legitimate and sometimes even necessary.

Emacs determines the base direction of a paragraph by searching for
the first strong directional character in the paragraph (this is a
simplification, the actual rules described in the UBA are more
complex).  Buffer-local variable bidi-paragraph-direction overrides
this dynamic calculation and forces a specific base direction on all
paragraphs of the buffer.

With this out of our way, I will assume that you were asking about
URLs that are part of paragraphs whose base direction is RTL.  Now
let's go back to your question:

> do they expect to see http://myspace.com or ‮?http://myspace.com

The answer to your question is "it depends".  Here are 3 examples, to
see them as I intended, make sure you are viewing them in a buffer
whose bidi-paragraph-direction is set to nil:

abc http://אבג.דהוזחט.קום

אבג http://foo.bar.com

אבג http://אבג.דהוזחט.קום

The leading 3 letters (1 would be enough) cause Emacs to decide that
the paragraph has LTR base direction in the 1st example and RTL base
direction in the last 2 examples.

Now move the cursor with C-f from the beginning of each of these three
lines (you can get to the beginning of a line with C-a or Home, as
usual), and I hope you will see what's going on: cursor movement with
C-f follows the "reading order", i.e. the order in which a human is
supposed to read these URLs.

To summarize: Latin characters are displayed left to right, even in
RTL paragraphs, while right-to-left characters are always displayed
right to left.  Neutral characters (slash, period) take the direction
of the surrounding text.

> (If I did that correctly, the latter URL should have an RLO character
> preceding it so that it reads right to left.)

As you see above, there's no need to use any directional overrides to
see what users expect: Emacs does that automatically, by following the
Unicode Bidirectional Algorithm (UBA).  You just need to arrange for
the paragraph to have a RTL base direction, which is very easy, as
shown above.

RLO and LRO (and the other directional control characters) are needed
when you need to override the normal reordering for some reason,
typically because you want punctuation characters to take a different
directionality from its default.  This is rarely needed when rendering
URLs.

HTH

May I ask why you came up with the question?




reply via email to

[Prev in Thread] Current Thread [Next in Thread]