[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode confusables and reordering characters considered harmful, a

From: Eli Zaretskii
Subject: Re: Unicode confusables and reordering characters considered harmful, a simple solution
Date: Fri, 05 Nov 2021 10:31:39 +0200

> From: Daniel Brooks <db48x@db48x.net>
> Cc: cpitclaudel@gmail.com,  yuri.v.khan@gmail.com,  stefan@marxist.se,
>   monnier@iro.umontreal.ca,  emacs-devel@gnu.org
> Date: Thu, 04 Nov 2021 19:23:08 -0700
> Eli Zaretskii <eliz@gnu.org> writes:
> > Then this visual noise will get in the way of people's reading those
> > comments and strings, and, for strings, will make it very hard to
> > understand what will be presented to the user when those strings are
> > output in some UI.
> >
> >> That’s where the problem is.
> >
> > No, the problem is elsewhere entirely: it's in the punctuation
> > characters unrelated to strings and comments whose directionality is
> > overridden, and which thus display in places that cause incorrect
> > visual interpretation of the program during a casual read.
> Look at the examples again. In many of them, all of the bidi override
> characters are inside a string or comment.

Not relevant to the point I was trying to make.  (And what about those
cases where the directional controls are outside the comments or

> When that is the case, these characters are only a problem if they
> cause characters that are inside the string or comment to appear to
> be outside of it, by reordering those characters relative to the
> syntactic markers for the string or comment. In other examples these
> characters are _outside_ the string or comment.
> Unless Emacs has specific knowledge of the language syntax, showing the
> characters is the only sure way to know if there is a problem or not.

The command I installed achieves this without requiring any knowledge
of the language syntax.  So no, yours is not the only way.

> > You misunderstand the cause.  The mere presence of these characters is
> > NOT the root cause.  These characters are legitimate and helpful when
> > used as intended.  See TUTORIAL.he for a pertinent example.
> Please don’t presume to tell me what I do or don’t understand. Yes,
> there are use cases which are not harmful, but as I have said it must be
> up to either the programmer or the compiler to answer that
> question. Emacs doesn’t know the syntax of every programming language.

Emacs should do a good job of not crying wolf too much, or else the
programmer will turn off these safety nets.  The feature you propose
as THE solution for the issue flags each and every use of these
characters, the absolute majority of which is completely legitimate.
That is bad for safety/security related warnings: if they have too low
signal-to-noise ratio, people will disable them and lose all the

> >> Furthermore, I have not suggested that showing the characters needs to
> >> preclude any other form of highlighting. If you wish to develop some
> >> additional way of warning the developer, please do so.
> >
> > We are talking about what should be in Emacs.  What you suggest
> > shouldn't.
> No other suggested feature will be useful to me. This one will. I
> suggest to you that you do not know what all users want.

I submit that users who'd want your feature indeed don't know what
they want.  They are perhaps alarmed by the brouhaha around this
issue, whose details they don't understand, but that is all.

> > Since the Rust compiler evidently does this when it finds these
> > characters inside comments (and probably also inside strings), IMNSHO
> > this is a terrible misfeature, because it means code that uses those
> > controls in legitimate ways cannot be compiled without tweaking
> > non-default options.  That's a cop-out, not the way to flag the
> > problematic cases.
> Your conclusion here is incorrect. Rust has choosen a fast strategy,
> where they implement a broad error today (well, four days ago) knowing
> that it does not prevent them from introducing a more refined error or
> set of errors later.

Then let's withdraw our approval of what they did until they do
introduce those more refined set of errors, shall we?  For now, their
cure is worse than the disease, because it will fail completely
legitimate programs out of fear of the illegitimate ones, which might
never come.

> Rust also has a very flexible annotation system that allows the
> programmer to annotate specific statements and language items. If a use
> of these characters is determined to be legitimate, the programmer can
> annotate the comment, or the function the comment is in, so that this
> error is disabled.

IME, programmers don't like to do stuff that doesn't directly help
them, and will do anything to evade that.  Especially in the Free
Software world, where usually there's no boss telling them what to do.

> > I think this is terrible.  At best, it only tells you that something
> > non-trivial goes on here (but what exactly?).  At worst, it looks like
> > corruption of the source.  And while in the malicious case treating
> > that as corruption is not such a bad idea, all the valid uses of these
> > characters will also look like corruption.  Which means the cure is
> > probably worse than the disease, because the malicious cases are a
> > tiny fraction of the valid ones.
> I cannot believe that you really think this. It shows up with exactly the
> same highlighting that your recently–introduced
> highlight-confusing-reorderings function uses.

In those few examples, carefully chosen to include only the malicious
reordering, yes.  But try it on legitimate uses of those control
characters, and you will see that highlight-confusing-reorderings
doesn't highlight anything (barring bugs), unlike your proposal that
does.  And that's the main point I'm trying to make: features such as
this one cannot afford crying wolf too much.

> Yours doesn’t even work with `next-error`.

It wasn't supposed to.  It was supposed to be similar to
flyspell-mode, which also "doesn't work" with next-error.  Of course,
if we decide that next-error should be able to find such places, we
can always add that (emacs 29 is still very far from a release, and we
have ample time for that), but I doubt it would be a good idea,
because next-error is about messages emitted by compilers, and this is
not a compiler-based feature.

That said, if the new command doesn't help you, you are free not to
use it, of course.  Hopefully, people who are really interested in
finding the maliciously reordered code will.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]