bug-apl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Confused ⎕UCS


From: Dr . Jürgen Sauermann
Subject: Re: Confused ⎕UCS
Date: Sun, 22 Nov 2020 15:19:19 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0

Hi Adam,

thanks for sharing your observation. What you observed is that ⎕UCS in GNU APL is an
extension of the original  ⎕UCS in IBM APL2.

Formally speaking, an extension of a function F1 is a function F2 which returns
the same values as F1 for all arguments in the definition domain of F1, but
other results in the definition domain of F2 (i.e. where F1 would raise a DOMAIN
ERROR).

The IBM ⎕UCS had some rather arbitrary limitations, which were removed in
GNU APL's ⎕UCS:

* In IBM ⎕UCS the argument had to be a simple array of integers or characters.
This type of arrays were common in good old APL1, but is no longer adequate for
APL2. GNU APL extended this limitation to allow mixed arrays (of characters
and integers) as well.

* In IBM APL2 the integers had to be positive. However, many such characters
originate from some ⎕FIO function which could return negative integers (primarily
in the range ¯128 ... ¯1) and requiring them to be converted to positive integers
would be useless overhead that can simply be avoided to allow negative integers
as well.

Floating point and complex numbers are not allowed as to avoid interference
with ⎕CT (i.e. how should rounding be performed?).

IBM APL2 ⎕UCS is based on  ISO 10646 and not on Unicode. Therefore
the "last Unicode" should not make a difference (and is subject to change over
time anyway). In the past the set of Unicode code popints has grown over time
(as new characters were added in different Unicode releases) while the encodings
of these codepoints (e.g, UTF8) has decreased (from 0...0x7FFFFFFF in RFC 2279
down to 0...0x10FFFF in RFC 3629). I therefore came to the conclusion that the
range supported by GNU APL's ⎕UCS should be so large that it becomes immune to
future changes in the various character sets around.

Therefore the full picture of your last example is this:

      ⎕ucs 1114112
����

      ⎕ucs ⎕ucs 1114112
1114112

Best Regards,
Jürgen


On 11/22/20 10:00 AM, Adám Brudzewsky wrote:
I don't know if this is one bug or several, but here you go:

      ⎕UCS 100
d
      ⎕UCS 1E2
DOMAIN ERROR+
      ⎕UCS 100
      ^

      ⎕ucs ¯1
      ⎕ucs ¯100032+⍳3
ABC
      'ABC'=⎕ucs ¯100032+⍳3
0 0 0

      ⎕ucs 1114112 ⍝ last Unicode char is 1114111
����




reply via email to

[Prev in Thread] Current Thread [Next in Thread]