help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: if vs. when vs. and: style question


From: Pascal J. Bourguignon
Subject: Re: if vs. when vs. and: style question
Date: Wed, 01 Apr 2015 16:29:32 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)

Rusi <address@hidden> writes:

> On Wednesday, April 1, 2015 at 7:57:07 AM UTC+5:30, Emanuel Berg wrote:
>> Richard Wordingham writes:
>> 
>> > One of the issues with using the full set of Unicode
>> > characters is that many are easily misread when
>> > there are no constraints. Many Greek capitals look
>> > just like Roman capitals, and Latin 'o', Greek 'ο'
>> > and Cyrillic 'о' may be indistinguishable. This is
>> > not a good idea for writing code.
>> 
>> Good point. In addition, there are many Unicode chars
>> that aren't human language chars but instead are to be
>> used in geometric figures, in math and otherwise
>> scientific/engineering notation, and so on - and those
>> also collide (or almost so) with for example the
>> Latin 'o' and probably other letters as well.
>
> Of course — Richard does use the phrase "FULL set of Unicode characters"
>
> Currently we see programming languages ALREADY SUPPORTING large swathes of the
> 1 million chars for identifier-chars -- mostly the 'Letter' and perhaps
> the 'number/digit' categories.

Quick, without looking it up, is: ➒ a digit? a letter? something else?
What about Ⅸ or ๙? Are they digits or letters?

> So there are two somewhat opposite points:
> 1. Supporting the Babel of human languages in programming identifiers is
> probably a mistake.  In any case if a language must go that way, the choice of
> html seems more sane: active opt-in with (something like) a charset 
> declaration
> rather than have the whole truckload thrown at someone unsuspecting.
> So if a А (cyrillic) and the usual A got mixed up, at the least you asked for 
> it!!

Yes, a mandatory declarations could solve some problems.


> 2. The basic 'infrastructure' of a language in C think "; {}()" operators, '#'
> the quotes themselves etc is drawn exclusively from ASCII for historical 
> reasons
> that are 2015-irrelevant.

And have alternatives too: 

> Now python (for example) has half a dozen 'quoteds'
> - strings "...
> - unicode strings u"..."
> - triple quoted strings (can contain newlines) """..."""
> - raw strings r"..." special chars like backslash are not special
> etc 
>
> And the chars like « ‹ seem to be just calling for use

In German, they quote as:     »Hallo«
In French, they quote as:     « Salut ! »
In old books, they quote as:  « One line,
                              « another
                              « final line »

The real problem introduced by unicode, is that not only it has a lot of
complicated rules in itself, but the usage of foreign-language
characters would have to come with the corresponding localized rules
too!

There's no (contemporary) way any sane program can implement them
correctly, much less a program as unrelated to this (international human
language) domain as a programming language compiler.

I don't say once AI will be running on your smartphones (instead of on
Apple, Google or IBM supercomputers), that it won't be possible to have
it deal with that, even in compiler sources.  But not now.  It's too early.

-- 
__Pascal Bourguignon__                 http://www.informatimago.com/
“The factory of the future will have only two employees, a man and a
dog. The man will be there to feed the dog. The dog will be there to
keep the man from touching the equipment.” -- Carl Bass CEO Autodesk


reply via email to

[Prev in Thread] Current Thread [Next in Thread]