bug-gnu-emacs
[Top][All Lists]

## bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm du

 From: Itai Berli Subject: bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator Date: Tue, 4 Jul 2017 13:42:19 +0300

I'd like to add another reason why this behavior is problematic: it breaks interoperability with other plain text editors, since the text will not be displayed the same way. Consider, for instance, the very same plain text file
in GEdit: http://imgur.com/Iw4yrdQ
in Emacs: http://imgur.com/7kfWseE

Finally, the question of whether Emacs behavior is consistent with the UBA specifications is debatable, since when UBA section 3 states "Paragraphs may also be determined by higher-level protocols" the question is what exactly the "also" means: is it that the higher-level protocols (HLP) can decide that a newline character is not a paragraph boundary, as Emacs does, or is it that the HLP can only declare paragraph boundaries in addition to paragraph separator characters?

On Thu, Jun 29, 2017 at 9:36 PM, Itai Berli wrote:
> The UBA allows applications to employ "higher-level protocols" when
> deciding on base paragraph direction.  See section 4.3 in UAX#9 and specifically clause HL1 there.

> This is what Emacs does: it applies its own heuristics for this
> decision.  The reason for that is that Emacs's implementation of the
> UBA must work reasonably well in plain-text buffers, where typically
> long paragraphs are broken into lines by newline characters (which are
> paragraph separators according to the UBA), and many times the
> partition into lines is done by auto-fill or similar features, thus
> making the first character of the next line fairly arbitrary.  Using
> the UBA paragraph-direction determination would then produce
> unacceptable results, whereby the direction of a part of a paragraph
> could change in unpredictable ways when text is refilled.

As I understand it, the "higher-level protocols" provision is intended
to allow for such things as table cells, elements of structured markup
languages, and word processors that use an idio-syncratic
implementation of a paragraph separator *under the hood*. It is not
intended for plain running text; for this the standard specifies
explicitly what the paragraph separators for every operating system
are.

> typically long paragraphs are broken into lines by newline characters

I see no evidence of the validity of this statement on my system (Emacs
25.1.1). But even if this were so, it would still not merit
*hard-coding* the paragraph separator as a blank line, as there are
situations (such as the one I presented in my bug report) that require
a diffferent configuration.

> You can alleviate this to some extent by ...(in your case) starting
> the paragraph with an RLM control character before \noindent,
> optionally followed by an LRM or enclosing \noindent in LRE..PDF (so
> that the backslash displays to the left of "noindent").  This is
> admittedly a bit awkward, but I think the results are still acceptable.

As you mentioned, the solution is cubersome. It might have been
acceptable if this was the sole issue, but this example illustrates just one of
several problems that arise due to current paragraph separator
convention.

In conclusion, and on a personal note, I implore you to change this
behavior, and to do so as soon as possible, and not only for specialized
markup documents, but for every document.

I am currently working on my thesis. Emacs is useless to me as a text
editor of Hebrew texts without this feature. This is no
exaggeration.

The original reason I chose Emacs over other editors was because of
the combination of AUCTeX and the promise of full Unicode
compatibility. AUCTeX has delivered on its promise, but in the area of
Unicode, as far as my needs are concerned it is if there was no Unicode
support at all, and I will be sadly forced to look for a different editor.