Re: [emacs-bidi] Suboptimal display-reordering in minibuffer

Hi people,

First, thanks Eli and all contributors for the remarkable effort, and all the recent progress!

On Sat, Jun 26, 2010 at 10:17 PM, Larry Denenberg <address@hidden> wrote:

>> Suppose . . . I type a Hebrew control character, say control-bet,

>> which has no command binding. I quite properly get the message
>> "control-bet is undefined" in the minibuffer. But because it starts
>> with a Hebrew character, it gets shoved against the right margin,
>> which is wrong.

Note that there are two separate issues:

(1) Directionality (I'll use here B to represent hebrew Bet):

Should the message be displayed "is undefined B^" (RTL paragraph dir) or "^B is undefined" (LTR paragraph dir)

(2) Alignment (to right or left margin) - where that message is to be displayed. It makes sense to align to the "start" direction (i.e. right for RTL and left for LTR), but AFAIK this is a matter of style and not within the scope of the unicode standard.

(2) is a relatively minor problem, while (1) could be a real source for confusion to the reader.

>
>Why is it wrong?

I suppose I should be hesitant since I've been quite properly rebuked
for insufficiently reflective use of the words "quite properly" in
another thread, but I'll take a shot anyway: It's wrong because this is
an LTR sentence that happens to start with an RTL character, so the bidi
code comes to the wrong conclusion about directionality. If the message
were instead "Key control-bet is undefined" we'd agree that it's LTR
with a single inserted RTL, right? Just because the RTL character is at
the beginning of the sentence doesn't make the sentence RTL.

True. There is no way to the determine 100% surely the correct direction of a sentence out of context. That is why the unicode standard leaves the freedom for "higher level protocol" to set that ( http://unicode.org/reports/tr9/ HL1) .

When such information is not available, a simple default algorithm is described by the standard (rules P2, P3). This is implemented by common bidi reordering libs, and I guess this is the reason for what you see here.

Aren't problems like this the entire raison d'etre of the invisible RLM
and LRM characters?

One of the main reasons. True. But, depending on the bidi reordering function used, the application might be able to achieve the results by providing this "higher level choice" itself. With libfribidi, the "pbase_dir" input parameter can be used for that.

>> Since system minibuffer are always in English, maybe the minibuffer
>> should never be in display reorder mode.
>
>What do you mean by ``minibuffer are always in English''? The
>language and the paragraph direction are not necessarily related.

More precisely, I meant "the messages displayed by the minibuffer were
written in English with intended left-to-right logic; RTL characters in
these messages are implicitly quoted, carrying no semantic meaning".
Are there examples of minibuffer messages that we agree should be RTL?

IMO, since the echo messages are typically one-liners, their directionality should be defined by their language.

In Unix, if the message is translated to an RTL language (i.e. if LC_MESSAGES is Arabic/Hebrew/Persian and the proper entry exists in the translation file), then dir should be RTL.

Otherwise (as in the case you reported indeed), it should be set LTR.

I think this should work correctly 99% of the cases (In fact, at the moment, I cannot think any realistic case where it would fail).

>I think this is because the minibuffer and the echo area are not the
>same thing. They just use the same portion of the Emacs display.

Absolutely correct. Learn something new every day.

>Does it work to set bidi-display-reordering in two buffers named
>" *Echo Area 0*" and " *Echo Area 1*"?

Absolutely correct again! So now I can have it if I really want it.

>> Another way to handle the case of an English paragraph that starts with
>> a Hebrew character is to insert an LRM. In this case that would need to
>> be done by the code that finds character bindings. I think that code
>> should indeed be sensitive to the fact that the unbound character it's
>> about to echo might set display direction.
>

>The main point here is deciding whether echo area messages should be
>displayed with left-to-right paragraph direction forced on the display
>engine. Are we sure this is the case? Cannot there be echo area
>messages that we want to display with the right-to-left direction?

There could be, if the messages themselves are in Hebrew (via LC_MESSAGES and translation files).

I do not know if Emacs really has Arabic/Hebrew translations, but there is no reason why it should not be translated if it had not been done by now.

Maybe. I ask again, do we have an example? What I'm saying here is
that certain parts of Emacs should be more careful in the face of bidi.
When Emacs wants to write "X is undefined" in the echo area with X
variable, maybe it should carefully put an LRM before the X because of
new potential side effects. This is something like a web programmer
being super cautious about sanitizing values that users type in.
cf. http://xkcd.com/327/

>> Finally, an even more subtle (and unimportant) issue: The actual
>> message I see looks like this: "is undefined ^ב”. But I would have
>> expected "is undefined ב^”, no? Shouldn't control-bet be written with
>> the uparrow on the right when in RTL mode?
>

Don't know about "should" (because as you said, both of them look "wrong").

However if you let the standard unicode algorithm reorder the logical string "^B is undefined" with the default auto-detected directionality, it really does result with what you seem to expect (the circumflex (0x5e) is a neutral, and gets the directionality of the run). Maybe this is not really a circumflex, or maybe some other magic is at work here.

>I don't know which one is the correct one. Do we have any "prior art"
>in that some other applications display Ctrl-modified Hebrew
>characters?

Beats me. I just learned on the other thread that Windows may not even
admit the existence of such characters. I don't seem to be able to
insert them into a buffer. Probably they don't have Unicode codepoints.

Not AFAIK. Unicode is about plaintext, not keyboard codes.

I do not know of any keyboard codes to ctrl-hebrew chars either - details below:

From a brief check, on Linux with X, with Hebrew and English layouts, situation seems to be like that:

1) On the basic X level (I used xev to test) there is a "state" (binary flags, indicate e.g. if ctrl was held, and also the "group" i.e. if we are in Hebrew or English mode), keycode (a number, which is the same for "א" and "t"), and an "XLookupString" which is the same (14) for both "ctrl-t" and "ctrl-א" (but does differentiate between them if ctrl is not held). xev also reports "keysym" which is the unicode point for "t" in both cases (ctrl-t and ctrl-א), but is the unicode point for א if control is not pressed.

2) In gtk (a higher level interface), there is "gdk_keyval_name", which is either "א" or "t" according to the current layout (language mode). Whether or not ctrl was down is determined by the mask GDK_CONTROL_MASK in the state of the event.

Note that at both levels there is no specific code for "ctrl-א". Whatever it is that emacs sees is either generated by some higher level function that I am not aware of, or generated within emacs itself. Probably we should look it up in the code.

Maybe they don't make sense at all. I will think about this further.

If there is such a thing as control-bet, then I think it should be
displayed as "ב^" in RTL text, and as "^ב" in LTR text.

>P.S. Thanks for starting these discussions. Sometimes I think that no
>one is using the bidi features, which makes me wonder why I worked on
>them so hard.

You worked on them for the joy of solving the problem, I hope. If you
think lots of people will be using them, I'm afraid you'll be sadly
disappointed.

I started using Emacs around 30 years ago, after grudgingly converting
from vi, which I grudgingly converted to from TECO. I still read mail
and write in Emacs---it's what I know. But I get mail in Hebrew and I
can't read it, nor answer without other tools. I've been waiting for
emacs bidi for years. I check around every few months, and only a
couple weeks ago that I saw that my wishes were finally fulfilled. It's
now a joy to read and answer mail. I'm very very grateful.

But I'm a dinosaur. Are there really any new emacs users these days?
I'd be very, very surprised.

Well I'm just another dino myself :-)

I've recently seen a new emacs user, but probably I am to blame (we installed emacs on his computer (Windows, notepad, etc.), because I had to do some work when mine was down. Few weeks later I was surprised to see he's still using it).

From:	Amit Aronovitch
Subject:	Re: [emacs-bidi] Suboptimal display-reordering in minibuffer
Date:	Sun, 27 Jun 2010 06:30:27 +0300