Re: [Bug-apl] Some thoughts regarding optimization of memory allocation

bug-apl

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-apl] Some thoughts regarding optimization of memory allocation

From:	Juergen Sauermann
Subject:	Re: [Bug-apl] Some thoughts regarding optimization of memory allocation
Date:	Tue, 01 Jul 2014 15:24:55 +0200
User-agent:	Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130330 Thunderbird/17.0.5

Hi David,

I agree to your analysis below. The problem with "in place" modificationlike A←A,x is another one:

the ownership of A may be unknown.

Some time ago we had an optimization of A⍴B modifying B in place when Awas ≤⍴B. That looked goodinitially but fired back when static APL values (like ⍬) were modified.So we had to revert the optimization.

The general problem when judging an optimization is the overall effecton execution time. That not onlyincludes the time saved when the optimization can be applied but alsothe time spent when it can't.

For example, detecting a pattern like A←A,x costs (not very much, but) alittle. Unfortunately that little extracost is executed very often. So if we do such an optimization that thecode may get faster (if the time savedby the optimization is larger than the extra cost), but could also getslower (if A←A,x is rarely optimized).

And often a different APL code does the trick, for exampleA[IDX←IDX+1]←x. if A is expected to be large.This is used, by the way, in 10 ⎕CR. The "double the size" trick is usedinternally in GNU APL (see Simple_string.hh).It can be further improved by putting a lower bound on the size so thatdoubling of small vectors is avoided.

So the final question is: shall we pay a price for the optimization ofsub-optimal APL code? I tend to say "no".


/// Jürgen


On 07/01/2014 06:34 AM, David Lamkins wrote:

This is a long post. Part one is background to motivate theimplementation outline in part two. If you already understand orderanalysis (i.e. the asymptotic analysis of program runtime), then youalready understand everything I'll say in Part 1; you can skip to Part 2.
Part 1 - Motivation
So long as you can work with data of definite size, APL is able toallocate the correct amount of storage in time proportional to(assuming that allocation must be accompanied by initialization) thesize of the storage. That's pretty good. (There might be OS-dependenttricks to finesse the initialization time, but that's not the subjectof this note...)
If you have to work with data of indeterminate size and store thatdata in an APL vector or array, that's where things get tricky.
The classic example of this is accumulating data elements from aninput stream of indeterminate length:
0.Initialize an empty vector; this will, while the program is running,accumulate data elements.1. If there's no more data to read, we're done. The vector containsthe result.2. Read a data element from the input stream and append the element tothe vector.
3. Go to step 1.
As this program runs, the vector will take on lengths in the series 0,1, 2, 3, 4, 5, ... Each time the vector expands, all of the data mustbe copied to the new vector before appending the next element. If youlook at copying the data as the "cost" of this algorithm, you'll seethat this is a really inefficient way to do things. The runtime isO(n^2). In other works, the runtime grows as the square of the input size.
One way to optimize this program is to extend the vector by a fixedsize every time you need more space for the next element. IOW, ratherthan making space for one more element, you'd make space for a hundredor a thousand elements; by doing so, you reduce the number of timesyou must copy all the existing data to an expanded vector. This seemslike a smart approach. It definitely gives you better performance fora data set of a given size. However, looking at the asymptotic limit,this is still an O(n^2) algorithm.
So how do you improve the runtime of this algorithm...? A commonapproach is to double the size of the vector each time you run out ofspace. The size of the vector will increase pretty rapidly.Intuitively, you're giving yourself twice as much time to run a linearalgorithm each time you run out of space for a new element. Theasymptotic runtime won't be linear (you'll still have to stop and copyat increasingly infrequent intervals), but it'll be better than O(n^2).
What's the downside? Your program will eat more memory than it reallyneeds. In fact, depending upon where you run out of input, the programmay have allocated up to twice as much memory as it really needs.
Here's where I argue that trading memory for time is a good thing...Memory is cheap. If you're going to be solving a problme that taxesphysical memory on a modern computer, you (or your employer) canalmost certainly afford to double your machine's memory. Also, memoryis slow. A modern processor can execute lots and lots of cycles in thetime it takes to read or write one word of data in memory. This is whycopies dominate the runtime of many algorithms.
The trick in making this "double the size before copying" algorithm isto somehow manage that overallocated space. That the subject of Part 2.
Part 2 - Double, Copy, Trim
I haven't yet looked at how all this might fit into GNU APL. Some ofmy assumptions may be wrong. This section is intended to provokediscussion, not to serve as a design sketch.
The "double the size before copying" part of the algorithm seemssimple enough. (Keep in mind that I'll use this phrase repeatedlywithout also mentioning the necessary test for having run out ofallocated-but-unused space.) The only tricky parts are (a) knowingwhen to do it, (b) using a new bit of header information that keepstrack of allocated size as opposed to actual size, and (c) handlingthe new case of mutating an existing variable (specifically, doing anupdate-in-place over a previously allocated-but-unused element) ratherthan allocating a fresh copy of a precise size and copying the entireresult.
Why is "knowing when to do it" tricky? I'm assuming that GNU APLevaluates an expression of the form a←a,x by computing the size of a,xbefore allocating a new a and copying all the data. We could blindlydouble the size of a each time we need to make more room; at theworst, our programs would use about twice as much memory. But we mightbe able to do better...
What we need is an inexpensive way of predicting that a vector may besubject to repeated expansion.
We could use static analysis of the program text: Every time we see anexpression of the form a←a,... we could choose the "double the sizebefore copying" technique. We might even strengthen the test byconfirming that the assignment appears inside a loop.
Alternatively, we might somehow look at access patterns. If thevariable is infrequently updated (this assumes that we can have atimestamp associated with the variable's header), then we might take achance and do minimal (i.e. just enough space for the new element)reallocation. On the other hand, if we're updating that variablefrequently (as we'd do in a tight loop), then it'd make sense to applythe "double the size before copying" technique.
That's about all I can say for now about the double and copy parts.What about trim?
Let's assume that we'd like to do better than to potentially doublethe runtime size of our GNU APL programs. What techniques might we useto mitigate that memory growth.
We might set an upper bound on "overhead" memory allocation. (In thiscase, the overhead is the allocated-but-unused space consumed byreserving unused space in variables that might be subject to growth.That seems like an attractive approach until you consider thatwhatever threshold you select at which to stop reserving extra space,you could reach that threshold before hitting the one part of yourprogram that's really going to benefit from the optimization. Wereally need to make this work without having to tweak interpreterparameters on a per-program and per-run basis.
One possible approach might be to "garbage-collect" reserved spaceupon some trigger condition. (Maybe memory pressure, maybe time, maybesomething else...) The downside is that this can fragment the heap. Onthe other hand, we might get away with it thanks to VM and locality ofaccess; it may take some experiments to work out the best approach.
Another possible approach might be to dynamically prioritize whichvariables are subject to reservation of extra space upon expansion,based upon frequency of updates or how recently the variable wasupdated. Again, experimentation may be necessary.
Finally, the same "double the size before copying" technique should beapplied to array assignments (e.g. a[m]←a[m],x).
Appendix
The attached test.apl file contains two functions: test1 and test2.(I'm tempted to break into Dr. Seuss rhyme, but I'll refrain...)
The test1 function expands a vector by one element each time anelement is added.
The test2 function doubles the size of the vector each time thecurrent allocation fills up.
For small N, you'd be hard pressed to notice a difference between theruntime of test1 and test2. For example, try
      test1 5
      test2 5
      test1 50
      test2 50
      ... etc
At some point, you'll see a dramatic difference. On my machine,there's a noticeable difference at 5000. At 50000, the difference isstunning.
--
"The secret to creativity is knowing how to hide your sources."
   Albert Einstein


http://soundcloud.com/davidlamkins
http://reverbnation.com/lamkins
http://reverbnation.com/lcw
http://lamkins-guitar.com/
http://lamkins.net/
http://successful-lisp.com/

[Prev in Thread]

Current Thread

[Next in Thread]

[Bug-apl] Some thoughts regarding optimization of memory allocation, David Lamkins, 2014/07/01
- Re: [Bug-apl] Some thoughts regarding optimization of memory allocation, David B. Lamkins, 2014/07/01
  - Re: [Bug-apl] Some thoughts regarding optimization of memory allocation, Elias Mårtenson, 2014/07/01
    - Re: [Bug-apl] Some thoughts regarding optimization of memory allocation, Blake McBride, 2014/07/01
    - Re: [Bug-apl] Some thoughts regarding optimization of memory allocation, Elias Mårtenson, 2014/07/01
- Re: [Bug-apl] Some thoughts regarding optimization of memory allocation, Juergen Sauermann <=
  - Re: [Bug-apl] Some thoughts regarding optimization of memory allocation, Elias Mårtenson, 2014/07/01

Prev by Date: Re: [Bug-apl] Mastering Dyalog APL - link outdated
Next by Date: Re: [Bug-apl] Some thoughts regarding optimization of memory allocation
Previous by thread: Re: [Bug-apl] Some thoughts regarding optimization of memory allocation
Next by thread: Re: [Bug-apl] Some thoughts regarding optimization of memory allocation
Index(es):
- Date
- Thread