emacs-diffs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Emacs-diffs] master dd60105: Allow to search for characters whose bidi


From: Eli Zaretskii
Subject: [Emacs-diffs] master dd60105: Allow to search for characters whose bidi directionality was overridden.
Date: Tue, 02 Dec 2014 14:16:09 +0000

branch: master
commit dd601050e7db69f322eea09d99751d8e6363b153
Author: Eli Zaretskii <address@hidden>
Commit: Eli Zaretskii <address@hidden>

    Allow to search for characters whose bidi directionality was overridden.
    
     src/bidi.c (bidi_find_first_overridden): New function.
     src/xdisp.c (Fbidi_find_overridden_directionality): New function.
     (syms_of_xdisp): Defsubr it.
     src/dispextern.h (bidi_find_first_overridden): Add prototype.
    
     doc/lispref/display.texi (Bidirectional Display): Document
     'bidi-find-overridden-directionality'.
    
     etc/NEWS: Mention 'bidi-find-overridden-directionality'.
---
 doc/lispref/ChangeLog    |    5 ++
 doc/lispref/display.texi |   54 ++++++++++++++++++
 etc/ChangeLog            |    4 +
 etc/NEWS                 |    7 ++
 src/ChangeLog            |    9 +++
 src/bidi.c               |   27 +++++++++
 src/dispextern.h         |    1 +
 src/xdisp.c              |  138 ++++++++++++++++++++++++++++++++++++++++++++++
 8 files changed, 245 insertions(+), 0 deletions(-)

diff --git a/doc/lispref/ChangeLog b/doc/lispref/ChangeLog
index 31a9cbf..f98e457 100644
--- a/doc/lispref/ChangeLog
+++ b/doc/lispref/ChangeLog
@@ -1,3 +1,8 @@
+2014-12-02  Eli Zaretskii  <address@hidden>
+
+       * display.texi (Bidirectional Display): Document
+       'bidi-find-overridden-directionality'.
+
 2014-11-29  Paul Eggert  <address@hidden>
 
        Lessen focus on ChangeLog files, as opposed to change log entries.
diff --git a/doc/lispref/display.texi b/doc/lispref/display.texi
index 4cb06dd..59f7322 100644
--- a/doc/lispref/display.texi
+++ b/doc/lispref/display.texi
@@ -6800,3 +6800,57 @@ affect all Emacs frames and windows.
 appropriate mirrored character in the reordered text.  Lisp programs
 can affect the mirrored display by changing this property.  Again, any
 such changes affect all of Emacs display.
+
address@hidden overriding bidirectional properties
address@hidden directional overrides
address@hidden LRO
address@hidden RLO
+  The bidirectional properties of characters can be overridden by
+inserting into the text special directional control characters,
+LEFT-TO-RIGHT OVERRIDE (@acronym{LRO}) and RIGHT-TO-LEFT OVERRIDE
+(@acronym{RLO}).  Any characters between a @acronym{RLO} and the
+following newline or POP DIRECTIONAL FORMATTING (@acronym{PDF})
+control character, whichever comes first, will be displayed as if they
+were strong right-to-left characters, i.e.@: they will be reversed on
+display.  Similarly, any characters between @acronym{LRO} and
address@hidden or newline will display as if they were strong
+left-to-right, and will @emph{not} be reversed even if they are strong
+right-to-left characters.
+
address@hidden phishing using directional overrides
address@hidden malicious use of directional overrides
+  These overrides are useful when you want to make some text
+unaffected by the reordering algorithm, and instead directly control
+the display order.  But they can also be used for malicious purposes,
+known as @dfn{phishing}.  Specifically, a URL on a Web page or a link
+in an email message can be manipulated to make its visual appearance
+unrecognizable, or similar to some popular benign location, while the
+real location, interpreted by a browser in the logical order, is very
+different.
+
+  Emacs provides a primitive that applications can use to detect
+instances of text whose bidirectional properties were overridden so as
+to make a left-to-right character display as if it were a
+right-to-left character, or vise versa.
+
address@hidden bidi-find-overridden-directionality from to &optional object
+This function looks at the text of the specified @var{object} between
+positions @var{from} (inclusive) and @var{to} (exclusive), and returns
+the first position where it finds a strong left-to-right character
+whose directional properties were forced to display the character as
+right-to-left, or for a strong right-to-left character that was forced
+to display as left-to-right.  If it finds no such characters in the
+specified region of text, it returns @code{nil}.
+
+The optional argument @var{object} specifies which text to search, and
+defaults to the current buffer.  If @var{object} is address@hidden, it
+can be some other buffer, or it can be a string or a window.  If it is
+a string, the function searches that string.  If it is a window, the
+function searches the buffer displayed in that window.  If a buffer
+whose text you want to examine is displayed in some window, we
+recommend to specify it by that window, rather than pass the buffer to
+the function.  This is because telling the function about the window
+allows it to correctly account for window-specific overlays, which
+might change the result of the function if some text in the buffer is
+covered by overlays.
address@hidden defun
diff --git a/etc/ChangeLog b/etc/ChangeLog
index 09dfd7f..4f672df 100644
--- a/etc/ChangeLog
+++ b/etc/ChangeLog
@@ -1,3 +1,7 @@
+2014-12-02  Eli Zaretskii  <address@hidden>
+
+       * NEWS: Mention 'bidi-find-overridden-directionality'.
+
 2014-11-29  Paul Eggert  <address@hidden>
 
        Lessen focus on ChangeLog files, as opposed to change log entries.
diff --git a/etc/NEWS b/etc/NEWS
index 6c636cf..bb016ee 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -98,6 +98,13 @@ environment.  For the time being this is implemented for 
modern POSIX
 systems and for MS-Windows, for other systems they fall back to their
 counterparts `string-lessp' and `string-equal'.
 
++++
+** The new function `bidi-find-overridden-directionality' allows to
+find characters whose directionality was, perhaps maliciously,
+overridden by directional override control characters.  Lisp programs
+can use this to detect potential phishing of URLs and other links that
+exploits bidirectional display reordering.
+
 *** The ls-lisp package uses `string-collate-lessp' to sort file names.
 If you want the old, locale-independent sorting, customize the new
 option `ls-lisp-use-string-collate' to a nil value.
diff --git a/src/ChangeLog b/src/ChangeLog
index 5c33765..7dc2b92 100644
--- a/src/ChangeLog
+++ b/src/ChangeLog
@@ -1,3 +1,12 @@
+2014-12-02  Eli Zaretskii  <address@hidden>
+
+       * bidi.c (bidi_find_first_overridden): New function.
+
+       * xdisp.c (Fbidi_find_overridden_directionality): New function.
+       (syms_of_xdisp): Defsubr it.
+
+       * dispextern.h (bidi_find_first_overridden): Add prototype.
+
 2014-12-02  Jan Djärv  <address@hidden>
 
        * nsimage.m (initFromSkipXBM:width:height:flip:length:): Set bmRep
diff --git a/src/bidi.c b/src/bidi.c
index 225acd9..a0bcf52 100644
--- a/src/bidi.c
+++ b/src/bidi.c
@@ -3376,6 +3376,33 @@ bidi_move_to_visually_next (struct bidi_it *bidi_it)
     UNGCPRO;
 }
 
+/* Utility function for looking for strong directional characters
+   whose bidi type was overridden by a directional override.  */
+ptrdiff_t
+bidi_find_first_overridden (struct bidi_it *bidi_it)
+{
+  ptrdiff_t found_pos = ZV;
+
+  do
+    {
+      /* Need to call bidi_resolve_weak, not bidi_resolve_explicit,
+        because the directional overrides are applied by the
+        former.  */
+      bidi_type_t type = bidi_resolve_weak (bidi_it);
+
+      if ((type == STRONG_R && bidi_it->orig_type == STRONG_L)
+         || (type == STRONG_L
+             && (bidi_it->orig_type == STRONG_R
+                 || bidi_it->orig_type == STRONG_AL)))
+       found_pos = bidi_it->charpos;
+    } while (found_pos == ZV
+            && bidi_it->charpos < ZV
+            && bidi_it->ch != BIDI_EOB
+            && bidi_it->ch != '\n');
+
+  return found_pos;
+}
+
 /* This is meant to be called from within the debugger, whenever you
    wish to examine the cache contents.  */
 void bidi_dump_cached_states (void) EXTERNALLY_VISIBLE;
diff --git a/src/dispextern.h b/src/dispextern.h
index 0dd0887..0ee5fd6 100644
--- a/src/dispextern.h
+++ b/src/dispextern.h
@@ -3173,6 +3173,7 @@ extern void bidi_push_it (struct bidi_it *);
 extern void bidi_pop_it (struct bidi_it *);
 extern void *bidi_shelve_cache (void);
 extern void bidi_unshelve_cache (void *, bool);
+extern ptrdiff_t bidi_find_first_overridden (struct bidi_it *);
 
 /* Defined in xdisp.c */
 
diff --git a/src/xdisp.c b/src/xdisp.c
index 989cbd1..0d31468 100644
--- a/src/xdisp.c
+++ b/src/xdisp.c
@@ -21032,6 +21032,143 @@ See also `bidi-paragraph-direction'.  */)
     }
 }
 
+DEFUN ("bidi-find-overridden-directionality",
+       Fbidi_find_overridden_directionality,
+       Sbidi_find_overridden_directionality, 2, 3, 0,
+       doc: /* Return position between FROM and TO where directionality was 
overridden.
+
+This function returns the first character position in the specified
+region of OBJECT where there is a character whose `bidi-class' property
+is `L', but which was forced to display as `R' by a directional
+override, and likewise with characters whose `bidi-class' is `R'
+or `AL' that were forced to display as `L'.
+
+If no such character is found, the function returns nil.
+
+OBJECT is a Lisp string or buffer to search for overridden
+directionality, and defaults to the current buffer if nil or omitted.
+OBJECT can also be a window, in which case the function will search
+the buffer displayed in that window.  Passing the window instead of
+a buffer is preferable when the buffer is displayed in some window,
+because this function will then be able to correctly account for
+window-specific overlays, which can affect the results.
+
+Strong directional characters `L', `R', and `AL' can have their
+intrinsic directionality overridden by directional override
+control characters RLO \(u+202e) and LRO \(u+202d).  See the
+function `get-char-code-property' for a way to inquire about
+the `bidi-class' property of a character.  */)
+  (Lisp_Object from, Lisp_Object to, Lisp_Object object)
+{
+  struct buffer *buf = current_buffer;
+  struct buffer *old = buf;
+  struct window *w = NULL;
+  bool frame_window_p = FRAME_WINDOW_P (SELECTED_FRAME ());
+  struct bidi_it itb;
+  ptrdiff_t from_pos, to_pos, from_bpos;
+  void *itb_data;
+
+  if (!NILP (object))
+    {
+      if (BUFFERP (object))
+       buf = XBUFFER (object);
+      else if (WINDOWP (object))
+       {
+         w = decode_live_window (object);
+         buf = XBUFFER (w->contents);
+         frame_window_p = FRAME_WINDOW_P (XFRAME (w->frame));
+       }
+      else
+       CHECK_STRING (object);
+    }
+
+  if (STRINGP (object))
+    {
+      /* Characters in unibyte strings are always treated by bidi.c as
+        strong LTR.  */
+      if (!STRING_MULTIBYTE (object)
+         /* When we are loading loadup.el, the character property
+            tables needed for bidi iteration are not yet
+            available.  */
+         || !NILP (Vpurify_flag))
+       return Qnil;
+
+      validate_subarray (object, from, to, SCHARS (object), &from_pos, 
&to_pos);
+      if (from_pos >= SCHARS (object))
+       return Qnil;
+
+      /* Set up the bidi iterator.  */
+      itb_data = bidi_shelve_cache ();
+      itb.paragraph_dir = NEUTRAL_DIR;
+      itb.string.lstring = object;
+      itb.string.s = NULL;
+      itb.string.schars = SCHARS (object);
+      itb.string.bufpos = 0;
+      itb.string.from_disp_str = 0;
+      itb.string.unibyte = 0;
+      itb.w = w;
+      bidi_init_it (0, 0, frame_window_p, &itb);
+    }
+  else
+    {
+      /* Nothing this fancy can happen in unibyte buffers, or in a
+        buffer that disabled reordering, or if FROM is at EOB.  */
+      if (NILP (BVAR (buf, bidi_display_reordering))
+         || NILP (BVAR (buf, enable_multibyte_characters))
+         /* When we are loading loadup.el, the character property
+            tables needed for bidi iteration are not yet
+            available.  */
+         || !NILP (Vpurify_flag))
+       return Qnil;
+
+      set_buffer_temp (buf);
+      validate_region (&from, &to);
+      from_pos = XINT (from);
+      to_pos = XINT (to);
+      if (from_pos >= ZV)
+       return Qnil;
+
+      /* Set up the bidi iterator.  */
+      itb_data = bidi_shelve_cache ();
+      from_bpos = CHAR_TO_BYTE (from_pos);
+      if (from_pos == BEGV)
+       {
+         itb.charpos = BEGV;
+         itb.bytepos = BEGV_BYTE;
+       }
+      else if (FETCH_CHAR (from_bpos - 1) == '\n')
+       {
+         itb.charpos = from_pos;
+         itb.bytepos = from_bpos;
+       }
+      else
+       itb.charpos = find_newline_no_quit (from_pos, CHAR_TO_BYTE (from_pos),
+                                           -1, &itb.bytepos);
+      itb.paragraph_dir = NEUTRAL_DIR;
+      itb.string.s = NULL;
+      itb.string.lstring = Qnil;
+      itb.string.bufpos = 0;
+      itb.string.from_disp_str = 0;
+      itb.string.unibyte = 0;
+      itb.w = w;
+      bidi_init_it (itb.charpos, itb.bytepos, frame_window_p, &itb);
+    }
+
+  ptrdiff_t found;
+  do {
+    /* For the purposes of this function, the actual base direction of
+       the paragraph doesn't matter, so just set it to L2R.  */
+    bidi_paragraph_init (L2R, &itb, 0);
+    while ((found = bidi_find_first_overridden (&itb)) < from_pos)
+      ;
+  } while (found == ZV && itb.ch == '\n' && itb.charpos < to_pos);
+
+  bidi_unshelve_cache (itb_data, 0);
+  set_buffer_temp (old);
+
+  return (from_pos <= found && found < to_pos) ? make_number (found) : Qnil;
+}
+
 DEFUN ("move-point-visually", Fmove_point_visually,
        Smove_point_visually, 1, 1, 0,
        doc: /* Move point in the visual order in the specified DIRECTION.
@@ -30461,6 +30598,7 @@ syms_of_xdisp (void)
   defsubr (&Scurrent_bidi_paragraph_direction);
   defsubr (&Swindow_text_pixel_size);
   defsubr (&Smove_point_visually);
+  defsubr (&Sbidi_find_overridden_directionality);
 
   DEFSYM (Qmenu_bar_update_hook, "menu-bar-update-hook");
   DEFSYM (Qoverriding_terminal_local_map, "overriding-terminal-local-map");



reply via email to

[Prev in Thread] Current Thread [Next in Thread]