[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode confusables and reordering characters considered harmful, a

From: Stefan Monnier
Subject: Re: Unicode confusables and reordering characters considered harmful, a simple solution
Date: Thu, 04 Nov 2021 15:05:13 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)

> However, your suggestion of highlighting the text affected by the bidi
> override characters while not actually showing those characters visibly
> is not something that I would care to use. It shows that there may be a
> problem without showing what the cause is. The cause is the presense of
> certain characters, and I must be able to see those characters in order
> to fix the problem, or even to judge whether there is a problem at
> all.

I don't think it's the case.

AFAIK there are 3 steps:
1- Become aware of the presence of something suspicious, i.e. a chunk of
   text that may not mean what you think.
2- Be able to confirm whether this is what it looks like or not.
3- Find the root cause.

Making the special control chars more visible can help at step 3 (tho
not in all cases since the problem can occur without using any of those
chars, as shown in my example code), but it's definitely not necessary
for step 1 (where highlighting the text as Eli suggest might be more
useful) nor for step 2 (where moving the cursor across the text is all
it takes to figure out what it really means).

Really, this is just another case of the "confusables": situations where
different sequences of bytes can result in the exact same display (or
maybe not 100% identical, but sufficiently similar that the untrained
eye won't notice the difference) yet be treated differently by
our tools.

The main problem I see is that the definition of "normal" and "abnormal"
depends on the programming language and even potentially to the human
reading the text as well.

For example, Imagine that the uppercase text below are written in
a script&language that's RTL:

My previous example had

    myfun (ARG1, ARG2)

where the rendering displayed ARG2 to the left or ARG1, making it
(presumably) confusing to the reader.  But if the code says:


Which would be more confusing?  To have first element displayed on the
left or to have it displayed on the right?
I think the answer strongly depends on the past experience of the
reader, so there's a human factor at play.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]