emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Change of Lisp syntax for "fancy" quotes in Emacs 27?


From: Garreau\, Alexandre
Subject: Re: Change of Lisp syntax for "fancy" quotes in Emacs 27?
Date: Sat, 06 Oct 2018 14:10:17 +0200
User-agent: Gnus (5.13), GNU Emacs 25.1.1 (i686-pc-linux-gnu)

Le 06/10/2018 à 14h50, Eli Zaretskii a écrit :
>> From: "Garreau\, Alexandre" <address@hidden>
>> Cc: Eli Zaretskii <address@hidden>, address@hidden,
>> address@hidden, address@hidden
>> Date: Sat, 06 Oct 2018 13:22:14 +0200
>> 
>> In a world where unicode is increasingly present and confusion about its
>> characters increasingly problematic (typosquatting, etc.) wouldn’t it be
>> reasonable to expect unicode-related semantic functions to be provided
>> in most frameworks, systems and languages to allow better handling of
>> such problems, thus making that problem the interface’s one?
>
> I don't think I understand what this means in practice; please
> elaborate.

afaik there are also problems in other contents than source code about
undistinguishable unicode character, such as the latin ?o and the
cyrillic ?о (the first example of unicode-powered typosquatting I ever
heard), the different spaces (sometimes not distinguishable in monospace
font), or, to stay on monospacing problems: I have great pain in writing
correct french text as I must always check in something not-emacs about
which one between ?– and ?— is the medium and the long dash (I normally
recall through their position on my keyboard but as they’re aside I
often forget), not to recall the different hacks about bidirectionality
you highlighted earlier.  I also heard about emails confusing
semantic-based bayesian anti-spam by putting not-spammy words in mails
that, because of some unicode tricks, wouldn’t be displayed to user.

This problems aren’t local to source code, nor to emacs (as many people
use something else than emacs to read mails, websites, news, and reading
domain names), and afaik there are canonicalizations and semantic
unicode categories functions to help knowing what is punctuation, what
is combining, what is displayed and takes how much space, and maybe, but
I’m unsure, which characters are to be difficult or even impossible to
distinguish (or some canonicalizations function to get two differently
encoded (related to combining characters (such as the difference between
"é" and "é" (made of ?e then ?́ (it’s fun to see how this last one is
strangely displayed and finely evaluated by emacs)))) strings comparable
the same, or two characters-different but looking-alike strings
comparable the same too).

I guess this issue is even going to be less a problem in free softwares
where theorically the writers should be well-intentioned and shouldn’t
try to trick the readers on what the software do (and/or it should at
least be reviewed with capable tools and/or knowledge), compared to
cases where this is going to be abusable and profitable, such as
typosquating ("google.com" and "gооgle.com" are not the same (it’s
interesting to notice too how emacs forward/backward-word detects and
use the language-switching to stop at the "оо", I’m astounished by these
capabilities I have to thank you guy for a such great piece of
software!) but google could aford (and took care) to buy both while not
everyone could do as well (and nobody yet reserved "amazоn.com"), and
people might crack, steal or blackmail using something like that).



reply via email to

[Prev in Thread] Current Thread [Next in Thread]