[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode confusables and reordering characters considered harmful, a

From: Gregory Heytings
Subject: Re: Unicode confusables and reordering characters considered harmful, a simple solution
Date: Mon, 08 Nov 2021 19:58:56 +0000

In fact, it did not take me much time to create a case that your algorithm doesn't detect (and AFAIU cannot detect without also displaying warnings about many legitimate uses). I attach the example code, how that code is displayed by Emacs, and how that code would be displayed with the patch I proposed.

Thanks, I've now enhanced the code which detects suspiciously reordered source to cover this kind of cases as well. I didn't see any legitimate uses flagged after the change, but if you can find any such cases, please show them and I will take a look.

Clearly, you failed to understand the meaning of my post. It did *not* mean:

Your algorithm could be improved.

It meant:

Your algorithm cannot be trusted.

It took less than 24 hours (after your commit) to a non-malevolent actor to find a way to escape the detection algorithm you implemented and which you claimed was the proper solution to the problem pointed to by the "Trojan Source" paper. Your slightly improved algorithm will evidently not resist longer if an actually malevolent actor tries to find a way to escape it (and of course they won't tell you when and how they did it).

So I'll say it one more time:

The only proper solution to that problem is to highlight, by default, these control characters in prog-mode and its descendants. That's the only 100% foolproof solution that guarantees that such constructs will never be missed, and this is what about 99.99% Emacs users need. The remaining 0.01% are those who:

1. Use RTL languages in their source code, AND

2. Use these reordering control characters in their source code, AND

3. Would find such highlighted characters annoying.

Those few users can turn that highlighting option off, either globally or by turning the minor mode off in this or that buffer.

The right balance is where the percent of false positives is very low.

IMO, that's not the right balance: the right balance is where the percentage of false negatives is zero.

If you need zero false negatives, and don't care about the level of noise (i.e. false positives), you have the features for that already: customize glyphless-char-display-control to show the control characters as acronyms or hex codes.

Again you clearly fail to understand what I said. The problem has nothing to do with me, the problem is, as the "Trojan Source" paper rightly explains, what the default settings of various available editors are. Claiming that asking every Emacs user (except the few users mentioned above) to set an obscure configuration option (which is only mentioned once, in passing, in the manual) is a solution to that problem is just wrong.

Anyway, it's now clear that this problem will remain unfixed in Emacs. Given this, I can only applaud the Rust developers when they took the decision to ban these control characters from Rust code files. If editors cannot be trusted to do a proper job on this matter, compilers should do it, and I hope that a similar solution will soon be adopted in other compilers.

And I leave this discussion with this post.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]