bug-apl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-apl] performance in of ¨


From: Christian Robert
Subject: Re: [Bug-apl] performance in of ¨
Date: Sat, 18 Jun 2016 02:47:21 -0400
User-agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1

Yes quite strange results, (svn is 745)

Xtian.

      time 'MEASURE 10000'
 A + B           .22
 A + B           .18
 A +¨ B         1.00
 A +¨ B         1.01
 A {⍺+⍵}¨ B   714.73
 A {⍺+⍵}¨ B   764.94
5.995263909

      time 'MEASURE2 10000'
 A + B            .00
 A + B            .00
 A +FE B         1.00
 A +FE B          .93
 A {⍺+⍵}FE B      .90
 A {⍺+⍵}FE B      .99
11.48487473

      ∇MEASURE[⎕]∇
    ∇
[0]   MEASURE B;A;D;R;T
[1]   A←⍳B ◊ T←7⍴0
[2]   T[1]←⎕FIO ¯1 ◊ ⊣ A + A              ◊ T[2]←⎕FIO ¯1 ◊ ⊣ A + A
[3]   T[3]←⎕FIO ¯1 ◊ ⊣ A +¨ A             ◊ T[4]←⎕FIO ¯1 ◊ ⊣ A +¨ A
[4]   T[5]←⎕FIO ¯1 ◊ ⊣ A {⍺+⍵}¨ A         ◊ T[6]←⎕FIO ¯1 ◊ ⊣ A {⍺+⍵}¨ A
[5]   T[7]←⎕FIO ¯1 ◊ D←T-1⌽T ◊ R←¯1↓D÷D[3]
[6]   ⍉2 6⍴(,2/'A + B' 'A +¨ B' 'A {⍺+⍵}¨ B') , (⊂8 2)⍕¨R
    ∇

      ∇MEASURE2[⎕]∇
    ∇
[0]   MEASURE2 B;A;D;R;T
[1]   A←⍳B ◊ T←7⍴0
[2]   T[1]←⎕FIO ¯1 ◊ ⊣ A + A              ◊ T[2]←⎕FIO ¯1 ◊ ⊣ A + A
[3]   T[3]←⎕FIO ¯1 ◊ ⊣ A +FE A             ◊ T[4]←⎕FIO ¯1 ◊ ⊣ A +FE A
[4]   T[5]←⎕FIO ¯1 ◊ ⊣ A {⍺+⍵}FE A         ◊ T[6]←⎕FIO ¯1 ◊ ⊣ A {⍺+⍵}FE A
[5]   T[7]←⎕FIO ¯1 ◊ D←T-1⌽T ◊ R←¯1↓D÷D[3]
[6]   ⍉2 6⍴(,2/'A + B' 'A +FE B' 'A {⍺+⍵}FE B') , (⊂8 2)⍕¨R
    ∇

      ∇FE[⎕]∇
    ∇
[0]   Z←A (LO FE) B;rho_Z;N;N_max
[1]   rho_Z←⍴B ◊ N←⎕IO ◊ A←,A ◊ N_max←N+⍴B←,B ◊ Z←N_max⍴0
[2]  LOOP: Z[N]←⊂(⊃A[N]) LO ⊃B[N] ◊ →(N_max>N←N+1)⍴LOOP
[3]   Z←rho_Z⍴Z
    ∇


On 2016-06-17 13:41, Juergen Sauermann wrote:
Hi,

I can confirm that something is strange with the macro based ¨ but I can't yet 
see what.
I wrote a simple measurement function to test *A+B* in different ways:

*A+B        (direct scalar)**
**A+¨B       (uses built-in each)
**A {⍺+⍵} B  (uses macro each**)**
*
The measurement program is this:

*      )clear**
**∇MEASURE B;A;D;R;T**
** A←⍳B ◊ T←7⍴0**
** T[1]←⎕FIO ¯1 ◊ ⊣ A + A              ◊ T[2]←⎕FIO ¯1 ◊ ⊣ A + A**
** T[3]←⎕FIO ¯1 ◊ ⊣ A +¨ A             ◊ T[4]←⎕FIO ¯1 ◊ ⊣ A +¨ A**
** T[5]←⎕FIO ¯1 ◊ ⊣ A {⍺+⍵}¨ A         ◊ T[6]←⎕FIO ¯1 ◊ ⊣ A {⍺+⍵}¨ A**
** T[7]←⎕FIO ¯1 ◊ D←T-1⌽T ◊ R←¯1↓D÷D[3]**
** ⍉2 6⍴(,2/'A + B' 'A +¨ B' 'A {⍺+⍵}¨ B') , (⊂8 2)⍕¨R**
**∇**
**∇Z←A (LO EACH_1) B;rho_Z;N;N_max**
**rho_Z←⍴B ◊ N←⎕IO ◊ A←,A ◊ N_max←N+⍴B←,B ◊ Z←N_max⍴0**
**LOOP: Z[N]←⊂(⊃A[N]) LO ⊃B[N] ◊ →(N_max>N←N+1)⍴LOOP**
**Z←rho_Z⍴Z**
**∇**
**∇Z←A (LO EACH_2) B;rho_Z;N;N_max**
**rho_Z←⍴B ◊ N←⎕IO ◊ A←,A ◊ N_max←N+⍴B←,B ◊ Z←N_max⍴0**
**LOOP: (N⌷Z)←⊂(N⌷A) LO N⌷B ◊ →(N_max>N←N+1)⍴LOOP**
**∇*

Every case is measured twice to see fluctuations, caching and the like.
The numbers are CPU cycles relative to the third line
The results are quite puzzling:


*            MEASURE 100**
** A + B           .18 **
** A + B           .14 **
** A +¨ B         1.00 **
** A +¨ B          .90 **
** A {⍺+⍵}¨ B    17.81 **
** A {⍺+⍵}¨ B    11.81 **
**
**
**            MEASURE 1000**
** A + B           .13 **
** A + B           .12 **
** A +¨ B         1.00 **
** A +¨ B         1.01 **
** A {⍺+⍵}¨ B    18.03 **
** A {⍺+⍵}¨ B    11.67 **
**
**            MEASURE 10000**
** A + B           .18 **
** A + B           .17 **
** A +¨ B         1.00 **
** A +¨ B          .29 **
** A {⍺+⍵}¨ B   111.45 **
** A {⍺+⍵}¨ B   111.85 **
*
So the macro based EACH becomes slower when the values grow. But why?
The macro code for the EACH macro is quite simple (and ⎕FXing it gives the same
behavior as the built-in macro.

*      ⎕CR 'μ-Z__vA_LO_EACH_vB'**
**Z←A (LO Z__vA_LO_EACH_vB) B;rho_Z;N;N_max**
**rho_Z←⍴B ◊ N←⎕IO ◊ A←,A ◊ N_max←N+⍴B←,B ◊ Z←N_max⍴0**
**LOOP: Z[N]←⊂(⊃A[N]) LO ⊃B[N] ◊ →(N_max>N←N+1)⍴LOOP**
**Z←rho_Z⍴Z*

I can't see anything in that code which would explain why the macro performance
drops when the values get larger. The difference between *A +¨ B *and*A {⍺+⍵}¨ 
B* is caused
by two independent factors:

one uses the built-in + while the other uses the user defined lambda {⍺+⍵}, and
one uses the built-in ¨ while the other uses the user defined macro 
Z__vA_LO_EACH_vB*
*
The impact of the second factor can be determined by running the same 
measurement with an
older version of GNU APL:

*            MEASURE 100  **
** A + B           .20 **
** A + B           .15 **
** A +¨ B         1.00 **
** A +¨ B          .94 **
** A {⍺+⍵}¨ B     5.53 **
** A {⍺+⍵}¨ B     5.36 **
**
**            MEASURE 1000**
** A + B           .14 **
** A + B           .10 **
** A +¨ B         1.00 **
** A +¨ B          .98 **
** A {⍺+⍵}¨ B     4.87 **
** A {⍺+⍵}¨ B     4.49 **
**
**            MEASURE 10000**
** A + B           .26 **
** A + B           .25 **
** A +¨ B         1.00 **
** A +¨ B          .42 **
** A {⍺+⍵}¨ B     2.42 **
** A {⍺+⍵}¨ B     2.41 *

Here the time decreases rather than increases with the vector lengths.
Thus macros are about 3 times slower at vectors of length 100 and 30 times
slower at vectors of length 10000.

And the LOOP line in the macro does the same as the corresponding C code in the
built-in EACH case.

I have no idea yet what is going on here. The memory consumption is probably a 
hint,
but I can't see where the memory goes. The A, B and Z variables are allocated 
only once
per macro call and the variables in the loop are small (which means that GNU 
APL is
recycling them internally rather than malloc()ing them).

Any ideas welcome!

/// Jürgen



On 06/17/2016 12:46 AM, Xiao-Yong Jin wrote:
Hi,

In my script I was using ⍎¨ on a vector of character vector from each line of a 
file.  After some testing, I guess it is not really the ⍎, that is slow.  The 
time scaling is strange.

In the following, the first row and the third row (using ¨ and ⍣ respectively) 
scales like n*3, while the problem size is only n*2.  My script ends up taking 
more than a minute to read a 1000×1000 numeric matrix from a text file.  On the 
other hand, the naive {⍵+1⊣⍎v}⍣n 1 indeed scales like n*2.

In addition, gnu apl is taking nearly a gigabytes of memory to run the 
following code, while dyalog is on the order of megabytes.

      te¨200×⍳3
  ⍎¨u 536     ⍎¨u 4129     ⍎¨u 13962
  ⍎ v   0     ⍎ v    0     ⍎ v     0
  ⍎u⍣ 518     ⍎u⍣ 4114     ⍎u⍣ 13889
  ⍎v⍣  15     ⍎v⍣   58     ⍎v⍣   131
      ∇te[⎕]∇
    ∇
[0]   r←te n;s;u;v;t
[1]   s←1E¯18×?n n⍴1E18
[2]   u←⍕¨⊂[2]s
[3]   v←↑u
[4]   t←⎕AI◊⊣⍎¨u◊r←(⊂'⍎¨u'),2⌷⎕AI-t
[5]   t←⎕AI◊⊣⍎v◊r←r,[.5](⊂'⍎ v'),2⌷⎕AI-t
[6]   t←⎕AI◊⊣{⍵+1⊣⍎⊃u[⍵]}⍣n 1◊r←r,[1](⊂'⍎u⍣'),2⌷⎕AI-t
[7]   t←⎕AI◊⊣{⍵+1⊣⍎v}⍣n 1◊r←r,[1](⊂'⍎v⍣'),2⌷⎕AI-t
    ∇

Best,
Xiao-Yong





reply via email to

[Prev in Thread] Current Thread [Next in Thread]