bug-apl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-apl] segfault when using 'CORE_COUNT_WANTED' configure flag


From: Dr . Jürgen Sauermann
Subject: Re: [Bug-apl] segfault when using 'CORE_COUNT_WANTED' configure flag
Date: Thu, 17 Oct 2019 16:48:30 +0200
User-agent: Mozilla/5.0 (X11; Linux i686; rv:60.0) Gecko/20100101 Thunderbird/60.6.1

Hi Blake,

as a matter of fact, the loops in my benchmarks are small, but
the data on which these small loops operate is not.  Practically this
means that all instructions run from the instruction cache (with an
instruction cache hit rate of 100%) but at the same time the data cache
hit rate is low.

A parallel APL program the suits the boundary conditions of both caches
would have a small code footprint (a short APL loop to suit the instruction
cache) but at the same time operate on few APL variables of small size
(to suit the data cache). Although one could probably construct such a
program for the sole purpose of benchmarking, its benefit would be limited
to the marketing of the interpreter, but not for the speeding-up real-life programs.

I am still waiting for the point in time where memory (not only caches) come
with the CPU (like numeric co-processors in the 1990s) and thenit is time to
reconsider parallel APL.

Best Regards,
Jürgen Sauermann



On 10/17/19 12:57 PM, Blake McBride wrote:


On Wed, Oct 16, 2019 at 7:06 AM Dr. Jürgen Sauermann <mail@jürgen-sauermann.de> wrote:
...

My current interpretation of various benchmarks that Elias Mårtenson and
myself did some years ago is that the bandwidth of the memory interface
between the CPUs (or cores) and the memory is the limiting factor, and no
matter how efficient the APL interpreter is, this bottleneck will dictate the
speedup that can be achieved.

Makes sense.  It is my understanding that CPU's are so much faster than any memory that memory can't even keep up with a single CPU.  The only reason we see speed improvements is in small loops that can fit in cache.  Long sequences, like a large array, can't even keep up with a single CPU.  I guess machine architecture will have to catch up.

Thanks.

Blake
 


reply via email to

[Prev in Thread] Current Thread [Next in Thread]