[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode confusables and reordering characters considered harmful, a

From: Eli Zaretskii
Subject: Re: Unicode confusables and reordering characters considered harmful, a simple solution
Date: Sat, 06 Nov 2021 12:48:29 +0200

> Date: Fri, 05 Nov 2021 23:33:39 +0000
> From: Gregory Heytings <gregory@heytings.org>
> cc: Stefan Kangas <stefan@marxist.se>, db48x@db48x.net, 
> cpitclaudel@gmail.com, 
>     yuri.v.khan@gmail.com, monnier@iro.umontreal.ca, emacs-devel@gnu.org
> > The right balance is where the percent of false positives is very low.
> IMO, that's not the right balance: the right balance is where the 
> percentage of false negatives is zero.

If you need zero false negatives, and don't care about the level of
noise (i.e. false positives), you have the features for that already:
customize glyphless-char-display-control to show the control
characters as acronyms or hex codes.  And if you want them to stand
out even more, you can in addition use highlight-regexp to show them
in some prominent background color.

However, this basically means you don't need to display any buffers
with truly bidirectional text as a matter of routine.  The command I
added yesterday is for those who do, for whom the level of noise from
false positives will be too much.

> When security is at stake, I very much prefer too many false
> positives to missing one danger.  In particular because such
> warnings give you the feeling that there is no danger when there is
> no warning.

That's fine.  Then you can use those other facilities.

> > I encourage you to read the comments in the implementation I wrote, to 
> > see which cases I consider "suspicious".
> This "I consider" is the problem of your approach.  Malevolent actors are 
> always more inventive, and will find a way to escape the safety net you 
> created.  The cases you consider suspicious are cases where the 
> directionality of one or more characters is overridden by reordering 
> control characters, but this is not what the "Trojan Source" paper is 
> about.  The problem it points to is much broader, it's about using these 
> invisible control characters to make the source code appear different to a 
> human reader and to a compiler.

The only way to make the source code appear different to a human
reader is to reorder some of the characters, by tweaking their
directionality using those formatting controls.  That is why those
control characters are used in these examples.  So there's no
difference between what I consider suspicious and what that paper
says, we just say it in different words.

> In fact, it did not take me much time to create a case that your algorithm 
> doesn't detect (and AFAIU cannot detect without also displaying warnings 
> about many legitimate uses).  I attach the example code, how that code is 
> displayed by Emacs, and how that code would be displayed with the patch I 
> proposed.

Thanks, I've now enhanced the code which detects suspiciously
reordered source to cover this kind of cases as well.  I didn't see
any legitimate uses flagged after the change, but if you can find any
such cases, please show them and I will take a look.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]