[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode confusables and reordering characters considered harmful, a

From: Daniel Brooks
Subject: Re: Unicode confusables and reordering characters considered harmful, a simple solution
Date: Wed, 03 Nov 2021 23:00:28 -0700
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Daniel Brooks <db48x@db48x.net>
>> Cc: Yuri Khan <yuri.v.khan@gmail.com>,  cpitclaudel@gmail.com,
>>   stefan@marxist.se,  monnier@iro.umontreal.ca,  emacs-devel@gnu.org
>> Date: Wed, 03 Nov 2021 12:54:31 -0700
>> > Do you read Hebrew?  Those characters look like line noise there,
>> > whereas the text with the default display is perfectly readable, and
>> > most people won't even know these controls are there (as intended).
>> My suggestion is to only enable it by default in _programming modes_. It
>> should remain disabled in ordinary prose like a TUTORIAL file.
> What about comments and strings?  Are we going to pretend that RTL
> scripts aren't used in those?

Of course it will show them in the comments and strings. That’s where
the problem is.

> You are welcome to make such customizations in your Emacs.  My point
> is that for a useful feature that doesn't get in the way when those
> controls are used for legitimate purposes, and only highlights _text_
> (NOT the controls!) whose appearance may have been altered by them for
> questionable or suspicious reasons -- for such a useful feature what
> you propose is not enough for having it in Emacs for everyone.  It is
> a blunt weapon that I would be ashamed to install.

Ok, it is helpful to know your thoughts on the matter.

However, your suggestion of highlighting the text affected by the bidi
override characters while not actually showing those characters visibly
is not something that I would care to use. It shows that there may be a
problem without showing what the cause is. The cause is the presense of
certain characters, and I must be able to see those characters in order
to fix the problem, or even to judge whether there is a problem at
all. Anything short of that is useless to me, and I suspect to many
others as well. Do you hide the tags when you write HTML? Do you hide
the parentheses when you write Lisp? Or the semicolons when you write C?
This is no different.

Furthermore, I have not suggested that showing the characters needs to
preclude any other form of highlighting. If you wish to develop some
additional way of warning the developer, please do so.

However, I suspect that the compilers for most languages currently in
active development will develop their own warnings and error messages as
well. We have plenty of ways for those messages to show up inside Emacs
as highlights.

Rust, for example, has already done so. Here’s an example:

    error: unicode codepoint changing visible direction of text present in 
      --> src/pathmap/path.rs:10:5
    10 |     /* } if is_admin  begin admins only */
       |     ^^-^^-^^^^^^^^^^--^^^^^^^^^^^^^^^^^^^^
       |     | |  |          ||
       |     | |  |          |'\u{2066}'
       |     | |  |          '\u{2069}'
       |     | |  '\u{2066}'
       |     | '\u{202e}'
       |     this comment contains invisible unicode text flow control 
       = note: `#[deny(text_direction_codepoint_in_comment)]` on by default
       = note: these kind of unicode codepoints change the way text flows on 
applications that support them, but can cause confusion because they change the 
order of characters on the screen
       = help: if their presence wasn't intentional, you can remove them

Naturally that already shows up inside of Emacs just fine; see the
attached image.


Attachment: Screenshot from 2021-11-03 22-51-18.png
Description: screenshot of a highlighted error inside Emacs

reply via email to

[Prev in Thread] Current Thread [Next in Thread]