Re: [Bug-apl] Boolean Packing

Hi Jürgen,

Heterogeneous arrays are generally small arrays. Things like tables of numbers with their columns and rows labeled with text, and argument lists to functions.

Big data, like large bitmaps, and large matrices of real or complex numbers, are generally homogeneous. Storing their ravels compactly, especially in the boolean case, can make the difference between an APL being usable or not usable for a given application.

While the performance hit in terms of inefficiently storing the data is minor, the memory hit isn't. In the example I cited, of the 274x65536 bitmap, it was 2 meg versus almost a gig.

In your example, of someone sticking a single character into a million word numeric array, I would much rather absorb the overhead of converting the whole thing for the assignment, than incur it for every million word array on creation, the vast majority of which are going to remain entirely numeric.

The technology of implementing dynamic languages has evolved quite a bit since APL2 and even GNU APL. Julia, a high level dynamic language, generates machine code for the functions and expressions you type in, and approaches the speed of compiled C. This is made possible by using high level building blocks such as LLVM. You could easily do the same thing for APL.

Regards,

Mike

On Fri, Aug 21, 2015 at 9:40 AM, Juergen Sauermann <address@hidden> wrote:

Hi,

in the good old days (= long before APL2) the data type was a property of the value and memory was
always too small. At that time all values were homogeneous.

Then APL2 came and most vendors of old APL simply added a new thing called a mixed value which would
take care of the non-homogeneous cases (and hoping there would not be that many).

GNU APL came long after APL and when thinking about using homogeneous arrays, I came to the
conclusion that they have no place in a new, object oriented design. In other words, GNU APL considers
mixed arrays as the normal case rather than the exception.

I would also argue that the speed advantage is very minor:
- it only applies to ∧ ∨ ⍲ ⍱ and monadic ∼ and only if the arrays are homogeneous (for ∧ and ∨ they do
not have to be).
- most of the time spent in APL execution is for checking various things as opposed to the pure computation
of results (read: new ravel elements).
- homogeneous arrays often trigger re-computation if the entire value; For example:
Z←1000000⍴0 ◊ Z[2]←2 ◊ Z←4]←'a'

And even though you may gain a few microseconds when computing the above functions, you pay them
back when checking for the homogeneous case all over the place. In other words, adding homogeneous
arrays will most likely fire back in terms of performance.

/// Jürgen

On 08/21/2015 07:06 AM, Elias Mårtenson wrote:

If you want to look at the code, all arrays are implemented as instances of Value. You will see references to Value_P as well, which is just a refcounted pointer to a Value. A Value has a Shape which defines its dimensions.

Code that processes array content does so using the method .get_ravel() and its various overloads.

I'm thinking that perhaps it would be possible to create an optimised primitive array by subclassing (or rather, extracting a common superclass) Value into something that can handle certain arrays much faster. Where that is actually faster depends on where the time is actually spent. If you still have to waste time boxing the values returned from .get_ravel() then there is little point.

The first step is to run your tests through cachegrind to determine exactly where the bottlenecks lie.

I have done a cachegrind analysis before, and I determined that most of the time was actually spent copying the arrays. The problem is that in an _expression_ such as A←A+1 where A is a large array, the entire content is copied, 1 is added to each element. However, since A is overwritten in the assignment a lot of time can be saved if the addition is done in-place. With an _expression_ such as A←⍉1+2×A you end up with three unnecessary copies and initialisations of the entire array. The time spent here is much larger than the time taken to actually perform the mathematical operations. I even implemented an optimisation for this which was rolled back because it was severely broken. However, when it worked it provided an order of magnitude performance improvement or more.

You might want to read up in the mailing list archives from a year or so ago where all of this was discussed in depth.

Regards,

Elias

On 21 August 2015 at 12:49, Mike Duvos <address@hidden> wrote:

Hi Elias,

It seems that there would be a substantial performance hit using C++ object management to construct your workspace. I haven't run any benchmarks before, but perhaps a quick comparison with APL2 would be useful at this point.

Let's make a little function that does something 100 times.

      ∇DO100[⎕]∇
    ∇
[0]   DO100 X;I
[1]   I←¯1
[2]  L1:→((I←I+1)≥100)/0
[3]   ⍎X
[4]   →L1
    ∇ 2015-08-20 21.00.29.850 (GMT-7)


And another little function that prints out the number of seconds something takes.

      ∇TIME[⎕]∇
    ∇
[0]   TIME X;TS
[1]   TS←⎕TS
[2]   ⍎X
[3]   (⍕(24 60 60 1000⊥¯4↑⎕TS-TS)÷1000),' Seconds.'
    ∇ 2015-08-20 21.01.09.470 (GMT-7)


And an array of a million floats.

      A←1000 1000⍴⍳1E6

And a 100x100 boolean identity matrix

      B←100 100⍴101↑1=1

[IBM APL2]

      TIME 'DO100 ''A←⍉A'''
2.391 Seconds.

      TIME 'DO100 ''B←B∨.∧B'''

7.591 Seconds.

      TIME 'DO100 ''C←≠\B'''
0.197 Seconds.


[GNU APL]

      TIME    'DO100 ''A←⍉A'''
54.987 Seconds.

      TIME 'DO100 ''B←B∨.∧B'''

13.325 Seconds

      TIME 'DO100 ''C←≠\B'''
59.361 Seconds.


That's actually not too horrible, considering what it's having to do.

Regards,

Mike

From:	Juergen Sauermann
Subject:	Re: [Bug-apl] Boolean Packing
Date:	Mon, 24 Aug 2015 17:32:00 +0200
User-agent:	Mozilla/5.0 (X11; Linux i686; rv:31.0) Gecko/20100101 Thunderbird/31.4.0