|From:||Dr . Jürgen Sauermann|
|Subject:||Re: Parallel APL Questions|
|Date:||Fri, 7 Feb 2020 20:25:51 +0100|
|User-agent:||Mozilla/5.0 (X11; Linux i686; rv:60.0) Gecko/20100101 Thunderbird/60.6.1|
let me try to answer some of your questions inline below...
On 2/7/20 6:35 PM, Andrew wrote:
Good eveningNo problem, youu found the right list.
⎕AI is rather imprecise, even worse than ⎕TS. For performance measurements on Intel
CPUs you should use ⎕FIO ¯1 (return CPU cycle counter) and maybe ⎕FIO ¯2 (return CPU frequency).
⎕FIO ¯1 is the most precise timing source that you can get in GNU APL.
The _expression_ above that you benchmarked is a mix of parallelized and not parallelized APL
primitives. Each of them is subject to varying execution times, so it is difficult to tell if the increased
execution time is caused by the parallel execution or by the anyhow varying execution times.
In my experience using all cores of a CPU is not optimal because external events from the OS (interrupts
etc) slow down one of the cores used for APL so that the CPU(s) hit by external events increase the
execution time of each primitive. If you leave one core unused (and if you are luck), then the scheduler
of the OS will see which cores are busy (execution APL) and will direct thos events to the unused core.
I also rather doubt that a virtual or emulated environment is able to tell anything about parallelized APL.
By the way there is a nechmarking workspace Scalar3.apl shipped with GNU APL that makes benchmarking of parallel GNU APL easier. Intel I9 is a good platform for running that workspace, but
avoid any virtualizations and ./configure it properly.
The speedups that can be achieved are generally disappointing. I have also compared Intel I7 with intel I9.
Seems like at the same CPU frequency and with the same core count, the I9 uis substantially faster
than the I7 but at the same time the I7 benefits more from parallelization than the I9. Most likely the
CPU optimizations in the I9 (compared to I7) aim at the same kind of parallelism, so that improvements
of one aspect (CPU architecture) are made at the expense of the other aspect (APL parallelization)
Could very well be. The _expression_ has a rather small amount of parallelization since the majority of
its primitives is not parallelized.
Currently all scalar functions and inner and outer products of them. One can proove These are the ones
that in theory and given the GNU APL implementation they must have a linear speedup (linear in the
number of cores). That is, on an I9 a scalar function on 4 cores must be 4 times faster than on one
core. In real life it is only 1.5 or so times faster. This points to a hardware bottleneck between the cores
and the memory. The scalar functions are so lightweight that the memory accesses (fetching the operands
and storing the results) dominate the entire execution time.
If you mean ./configure options by configurations then no. However some ./configure options have
performance impacts both for parallel and non-parallel execution. These should be switched off.
See README-2-configure for details.
Yes. In the early days of GNU APL I updated the apl-1.X.tar.gz files after every bug fix. I was then told
by the GNU project that this would mess up their mirrors so I stopped doing that. Therefore problems in
1.8 will only be fixed in 1.9, typically 1-2 years later.
|[Prev in Thread]||Current Thread||[Next in Thread]|