[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#27544: 25.1; Visualization of Unicode bidirectional marks

From: Itai Berli
Subject: bug#27544: 25.1; Visualization of Unicode bidirectional marks
Date: Sat, 1 Jul 2017 12:58:28 +0300

Emacs supports 12 Unicode bidirectional marks (ALM, RLM, LRM, LRE,
RLE, LRO, RLO, PDF, FSI, LRI, RLI, and PDI), each of which displays as
a very thin space. This raises two problems.

1. On the one hand, the fact that these inherently invisible
marks manifest, by default, as thin spaces undermines attempts at
precise alignment and positioning. Moreover, in the case of LRM, RLM
and ALM, this behavior contradicts explicit directions given in the
Bidirectional Algorithm 8.0.0 specifications (section 2.6 Implicit
Directional Marks):
> they do not appear in the display
(To my understanding, this is meant to apply to all bidi marks, even
if only stated explicitly for LRM, RLM and ALM.)

2. On the other hand, the fact that these spaces are so thin as to be
barely noticeable, and the fact that
they are indistinguishable from one another makes it difficult to debug
and resolve strange and/or erroneous behavior that can happen in a
bidi document, an example of which is given below.

The solution to both problems is to make the bidi marks visible in
`whitespace` mode only, and to give them glyphs that are (a) easy to
notice, (b) distinguishable from other whitespace visualization glyphs, (c)
distinct from one another.

The following example exhibits strange behavior that can arise due to
the use of bidi marks. This behavior is difficult to debug
without visualizing the bidi marks.

Consider the following paragraph.

ILLUSTRATION #1: An English sentence that is formatted from right to left.

The paragraph is entirely in English, then why is it formatted from right
to left? Without visible bidi marks, it's hard to tell; however a savvy
Unicode-aware person would realize that this must indicate the presence
of a Right-To-Left Mark (U+200F). Therefore, if we position the cursor
at the beginning of the paragraph (`C-a`), and delete the following
character (`C-d`), the sentence should display normally.

ILLUSTRATION #2: Deleting the first Right-To-Left mark at the
beginning of the paragraph has no effect.

Against our expectations, nothing appears to have changed. There must
be another Right-To-Left mark at the beginning of the paragraph. Let's
delete it as
well. (`C-d`)

ILLUSTRATION #3: Deleting the second Right-To-Left Mark left-aligns
the paragraph, but leave the comma misplaced.

The paragraph is now aligned to the left, as it should, and everything
looks normal, except for the comma, which appears in the beginning of
the paragraph. But this can be easily remedied: let's delete the comma
and then retype it in its proper place. We position the cursor at the
beginning of the paragraph (`C-a`) and delete the following character (`C-d`).

ILLUSTRATION #4: After trying to delete the comma, the paragraph is
finally displayed correctly.

Instead of deleting the comma, this has shifted the comma to it's
correct position.

If we were able to visualize the whitespace, we would have realized from
the beginning that the sequence of characters in this paragraph was, from
left to right:


Thus, our first three actions removed the first three characters, leaving us


We now realize that even the final, correct form, is in fact littered
with bidi errors and potential landmines!

In GNU Emacs 25.1.1 (x86_64-apple-darwin13.4.0, NS appkit-1265.21
Version 10.9.5 (Build 13F1911))
 of 2016-09-21 built on builder10-9.porkrind.org
Windowing system distributor 'Apple', version 10.3.1504
Configured using:
 'configure --with-ns '--enable-locallisppath=/Library/Application
 Support/Emacs/site-lisp' --with-modules'

Configured features:

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: TeX/P

Minor modes in effect:
  diff-auto-refine-mode: t
  TeX-PDF-mode: t
  ivy-mode: t
  shell-dirtrack-mode: t
  projectile-mode: t
  helm-descbinds-mode: t
  async-bytecomp-package-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  column-number-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent messages:
Applying style hooks... done
Mark set
C-> is undefined
Mark set [2 times]
Saving file /Users/itaiberli/Documents/GitHub/Thesis/test22.tex...
Wrote /Users/itaiberli/Documents/GitHub/Thesis/test22.tex
repeat-complex-command: There are no previous complex commands to repeat
delete-backward-char: Text is read-only

Load-path shadows:
/Users/itaiberli/.emacs.d/elpa/seq-2.20/seq hides

(shadow sort mail-extr emacsbug message rfc822 mml mml-sec epg mm-decode
mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader
sendmail rfc2047 rfc2045 ietf-drums mail-utils vc-git diff-mode tex-bar
toolbar-x font-latex plain-tex tex-buf latex tex-ispell tex-style tex
crm tex-mode latexenc colir color counsel jka-compr esh-util etags xref
project swiper reftex reftex-vars two-column ivy delsel ivy-overlay
helm-projectile helm-files rx image-dired tramp tramp-compat
tramp-loaddefs trampver shell pcomplete format-spec dired-x dired-aux
ffap helm-tags helm-bookmark helm-adaptive helm-info bookmark pp
helm-external helm-net browse-url xml url url-proxy url-privacy
url-expand url-methods url-history url-cookie url-domsuf url-util
url-parse auth-source gnus-util mm-util help-fns mail-prsvr
password-cache url-vars mailcap helm-buffers helm-grep helm-regexp
helm-utils helm-locate helm-help helm-types projectile grep compile
comint ansi-color ring ibuf-ext ibuffer thingatpt helm-descbinds helm
easy-mmode helm-source cl-seq eieio-compat eieio eieio-core
helm-multi-match helm-lib dired helm-config helm-easymenu cl-macs
async-bytecomp async advice edmacro kmacro finder-inf tex-site info
package epg-config seq byte-opt gv bytecomp byte-compile cl-extra
help-mode easymenu cconv cl-loaddefs pcase cl-lib time-date mule-util
tooltip eldoc electric uniquify ediff-hook vc-hooks lisp-float-type
mwheel ns-win ucs-normalize term/common-win tool-bar dnd fontset image
regexp-opt fringe tabulated-list newcomment elisp-mode lisp-mode
prog-mode register page menu-bar rfn-eshadow timer select scroll-bar
mouse jit-lock font-lock syntax facemenu font-core frame cl-generic cham
georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese charscript case-table epa-hook
jka-cmpr-hook help simple abbrev minibuffer cl-preloaded nadvice
loaddefs button faces cus-face macroexp files text-properties overlay
sha1 md5 base64 format env code-pages mule custom widget
hashtable-print-readable backquote kqueue cocoa ns multi-tty
make-network-process emacs)

Memory information:
((conses 16 359730 16119)
 (symbols 48 34262 0)
 (miscs 40 100 221)
 (strings 32 65306 15883)
 (string-bytes 1 1997869)
 (vectors 16 60314)
 (vector-slots 8 1721804 214602)
 (floats 8 589 398)
 (intervals 56 269 0)
 (buffers 976 19))

reply via email to

[Prev in Thread] Current Thread [Next in Thread]