bug-apl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-apl] Some thoughts regarding optimization of memory allocation


From: Juergen Sauermann
Subject: Re: [Bug-apl] Some thoughts regarding optimization of memory allocation
Date: Tue, 01 Jul 2014 15:24:55 +0200
User-agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130330 Thunderbird/17.0.5

Hi David,

I agree to your analysis below. The problem with "in place" modification like A←A,x is another one:
the ownership of A may be unknown.

Some time ago we had an optimization of A⍴B modifying B in place when A was ≤⍴B. That looked good initially but fired back when static APL values (like ⍬) were modified. So we had to revert the optimization.

The general problem when judging an optimization is the overall effect on execution time. That not only includes the time saved when the optimization can be applied but also the time spent when it can't.

For example, detecting a pattern like A←A,x costs (not very much, but) a little. Unfortunately that little extra cost is executed very often. So if we do such an optimization that the code may get faster (if the time saved by the optimization is larger than the extra cost), but could also get slower (if A←A,x is rarely optimized).

And often a different APL code does the trick, for example A[IDX←IDX+1]←x. if A is expected to be large. This is used, by the way, in 10 ⎕CR. The "double the size" trick is used internally in GNU APL (see Simple_string.hh). It can be further improved by putting a lower bound on the size so that doubling of small vectors is avoided.

So the final question is: shall we pay a price for the optimization of sub-optimal APL code? I tend to say "no".

/// Jürgen


On 07/01/2014 06:34 AM, David Lamkins wrote:
This is a long post. Part one is background to motivate the implementation outline in part two. If you already understand order analysis (i.e. the asymptotic analysis of program runtime), then you already understand everything I'll say in Part 1; you can skip to Part 2.

Part 1 - Motivation

So long as you can work with data of definite size, APL is able to allocate the correct amount of storage in time proportional to (assuming that allocation must be accompanied by initialization) the size of the storage. That's pretty good. (There might be OS-dependent tricks to finesse the initialization time, but that's not the subject of this note...)

If you have to work with data of indeterminate size and store that data in an APL vector or array, that's where things get tricky.

The classic example of this is accumulating data elements from an input stream of indeterminate length:

0.Initialize an empty vector; this will, while the program is running, accumulate data elements. 1. If there's no more data to read, we're done. The vector contains the result. 2. Read a data element from the input stream and append the element to the vector.
3. Go to step 1.

As this program runs, the vector will take on lengths in the series 0, 1, 2, 3, 4, 5, ... Each time the vector expands, all of the data must be copied to the new vector before appending the next element. If you look at copying the data as the "cost" of this algorithm, you'll see that this is a really inefficient way to do things. The runtime is O(n^2). In other works, the runtime grows as the square of the input size.

One way to optimize this program is to extend the vector by a fixed size every time you need more space for the next element. IOW, rather than making space for one more element, you'd make space for a hundred or a thousand elements; by doing so, you reduce the number of times you must copy all the existing data to an expanded vector. This seems like a smart approach. It definitely gives you better performance for a data set of a given size. However, looking at the asymptotic limit, this is still an O(n^2) algorithm.

So how do you improve the runtime of this algorithm...? A common approach is to double the size of the vector each time you run out of space. The size of the vector will increase pretty rapidly. Intuitively, you're giving yourself twice as much time to run a linear algorithm each time you run out of space for a new element. The asymptotic runtime won't be linear (you'll still have to stop and copy at increasingly infrequent intervals), but it'll be better than O(n^2).

What's the downside? Your program will eat more memory than it really needs. In fact, depending upon where you run out of input, the program may have allocated up to twice as much memory as it really needs.

Here's where I argue that trading memory for time is a good thing... Memory is cheap. If you're going to be solving a problme that taxes physical memory on a modern computer, you (or your employer) can almost certainly afford to double your machine's memory. Also, memory is slow. A modern processor can execute lots and lots of cycles in the time it takes to read or write one word of data in memory. This is why copies dominate the runtime of many algorithms.

The trick in making this "double the size before copying" algorithm is to somehow manage that overallocated space. That the subject of Part 2.



Part 2 - Double, Copy, Trim

I haven't yet looked at how all this might fit into GNU APL. Some of my assumptions may be wrong. This section is intended to provoke discussion, not to serve as a design sketch.

The "double the size before copying" part of the algorithm seems simple enough. (Keep in mind that I'll use this phrase repeatedly without also mentioning the necessary test for having run out of allocated-but-unused space.) The only tricky parts are (a) knowing when to do it, (b) using a new bit of header information that keeps track of allocated size as opposed to actual size, and (c) handling the new case of mutating an existing variable (specifically, doing an update-in-place over a previously allocated-but-unused element) rather than allocating a fresh copy of a precise size and copying the entire result.

Why is "knowing when to do it" tricky? I'm assuming that GNU APL evaluates an expression of the form a←a,x by computing the size of a,x before allocating a new a and copying all the data. We could blindly double the size of a each time we need to make more room; at the worst, our programs would use about twice as much memory. But we might be able to do better...

What we need is an inexpensive way of predicting that a vector may be subject to repeated expansion.

We could use static analysis of the program text: Every time we see an expression of the form a←a,... we could choose the "double the size before copying" technique. We might even strengthen the test by confirming that the assignment appears inside a loop.

Alternatively, we might somehow look at access patterns. If the variable is infrequently updated (this assumes that we can have a timestamp associated with the variable's header), then we might take a chance and do minimal (i.e. just enough space for the new element) reallocation. On the other hand, if we're updating that variable frequently (as we'd do in a tight loop), then it'd make sense to apply the "double the size before copying" technique.

That's about all I can say for now about the double and copy parts. What about trim?

Let's assume that we'd like to do better than to potentially double the runtime size of our GNU APL programs. What techniques might we use to mitigate that memory growth.

We might set an upper bound on "overhead" memory allocation. (In this case, the overhead is the allocated-but-unused space consumed by reserving unused space in variables that might be subject to growth. That seems like an attractive approach until you consider that whatever threshold you select at which to stop reserving extra space, you could reach that threshold before hitting the one part of your program that's really going to benefit from the optimization. We really need to make this work without having to tweak interpreter parameters on a per-program and per-run basis.

One possible approach might be to "garbage-collect" reserved space upon some trigger condition. (Maybe memory pressure, maybe time, maybe something else...) The downside is that this can fragment the heap. On the other hand, we might get away with it thanks to VM and locality of access; it may take some experiments to work out the best approach.

Another possible approach might be to dynamically prioritize which variables are subject to reservation of extra space upon expansion, based upon frequency of updates or how recently the variable was updated. Again, experimentation may be necessary.

Finally, the same "double the size before copying" technique should be applied to array assignments (e.g. a[m]←a[m],x).



Appendix

The attached test.apl file contains two functions: test1 and test2. (I'm tempted to break into Dr. Seuss rhyme, but I'll refrain...)

The test1 function expands a vector by one element each time an element is added.

The test2 function doubles the size of the vector each time the current allocation fills up.

For small N, you'd be hard pressed to notice a difference between the runtime of test1 and test2. For example, try

      test1 5
      test2 5
      test1 50
      test2 50
      ... etc

At some point, you'll see a dramatic difference. On my machine, there's a noticeable difference at 5000. At 50000, the difference is stunning.

--
"The secret to creativity is knowing how to hide your sources."
   Albert Einstein


http://soundcloud.com/davidlamkins
http://reverbnation.com/lamkins
http://reverbnation.com/lcw
http://lamkins-guitar.com/
http://lamkins.net/
http://successful-lisp.com/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]