Re: [Bug-apl] Some thoughts regarding optimization of memory allocation

On 1 July 2014 13:44, David B. Lamkins <address@hidden> wrote:

Again: proofread, *then* post... Sorry. :(

- To clarify: Part 2 is not really an implementation outline, but
rather a discussion of techniques.

- The test.apl file test2 function doesn't implement "double and copy",
but rather "preallocate all". This is a much simpler case. Here's the
"double and copy" function:

⍝ test3 starts with an empty vector and doubles its size every time we
⍝ need more space. I think that this'll be somewhat worse than O(n)
⍝ but better than O(n^2).

∇test3 n
x←n⍴''
i←0
a←0
loop:
→(i≥n)/0
i←i+1
→(i≤a)/mutate
x←x,(⍴x)⍴''
a←⍴x
mutate:
x[i]←'x'
→loop
∇

Informally (i.e. I didn't use a stopwatch or timing function) test3 is
about 2/3 as fast as test2 with N=5000000. Keep in mind that there are a
few more instructions in the test3 loop.

On Mon, 2014-06-30 at 21:34 -0700, David Lamkins wrote:
> This is a long post. Part one is background to motivate the
> implementation outline in part two. If you already understand order
> analysis (i.e. the asymptotic analysis of program runtime), then you
> already understand everything I'll say in Part 1; you can skip to Part
> 2.
>
>
> Part 1 - Motivation
>
> So long as you can work with data of definite size, APL is able to
> allocate the correct amount of storage in time proportional to
> (assuming that allocation must be accompanied by initialization) the
> size of the storage. That's pretty good. (There might be OS-dependent
> tricks to finesse the initialization time, but that's not the subject
> of this note...)
>
>
> If you have to work with data of indeterminate size and store that
> data in an APL vector or array, that's where things get tricky.
>
>
> The classic example of this is accumulating data elements from an
> input stream of indeterminate length:
>
>
> 0.Initialize an empty vector; this will, while the program is running,
> accumulate data elements.
>
> 1. If there's no more data to read, we're done. The vector contains
> the result.
>
> 2. Read a data element from the input stream and append the element to
> the vector.
>
> 3. Go to step 1.
>
>
> As this program runs, the vector will take on lengths in the series 0,
> 1, 2, 3, 4, 5, ... Each time the vector expands, all of the data must
> be copied to the new vector before appending the next element. If you
> look at copying the data as the "cost" of this algorithm, you'll see
> that this is a really inefficient way to do things. The runtime is
> O(n^2). In other works, the runtime grows as the square of the input
> size.
>
>
> One way to optimize this program is to extend the vector by a fixed
> size every time you need more space for the next element. IOW, rather
> than making space for one more element, you'd make space for a hundred
> or a thousand elements; by doing so, you reduce the number of times
> you must copy all the existing data to an expanded vector. This seems
> like a smart approach. It definitely gives you better performance for
> a data set of a given size. However, looking at the asymptotic limit,
> this is still an O(n^2) algorithm.
>
>
> So how do you improve the runtime of this algorithm...? A common
> approach is to double the size of the vector each time you run out of
> space. The size of the vector will increase pretty rapidly.
> Intuitively, you're giving yourself twice as much time to run a linear
> algorithm each time you run out of space for a new element. The
> asymptotic runtime won't be linear (you'll still have to stop and copy
> at increasingly infrequent intervals), but it'll be better than
> O(n^2).
>
>
> What's the downside? Your program will eat more memory than it really
> needs. In fact, depending upon where you run out of input, the program
> may have allocated up to twice as much memory as it really needs.
>
>
> Here's where I argue that trading memory for time is a good thing...
> Memory is cheap. If you're going to be solving a problme that taxes
> physical memory on a modern computer, you (or your employer) can
> almost certainly afford to double your machine's memory. Also, memory
> is slow. A modern processor can execute lots and lots of cycles in the
> time it takes to read or write one word of data in memory. This is why
> copies dominate the runtime of many algorithms.
>
>
> The trick in making this "double the size before copying" algorithm is
> to somehow manage that overallocated space. That the subject of Part
> 2.
>
>
>
>
> Part 2 - Double, Copy, Trim
>
>
> I haven't yet looked at how all this might fit into GNU APL. Some of
> my assumptions may be wrong. This section is intended to provoke
> discussion, not to serve as a design sketch.
>
>
> The "double the size before copying" part of the algorithm seems
> simple enough. (Keep in mind that I'll use this phrase repeatedly
> without also mentioning the necessary test for having run out of
> allocated-but-unused space.) The only tricky parts are (a) knowing
> when to do it, (b) using a new bit of header information that keeps
> track of allocated size as opposed to actual size, and (c) handling
> the new case of mutating an existing variable (specifically, doing an
> update-in-place over a previously allocated-but-unused element) rather
> than allocating a fresh copy of a precise size and copying the entire
> result.
>
>
> Why is "knowing when to do it" tricky? I'm assuming that GNU APL
> evaluates an _expression_ of the form a←a,x by computing the size of a,x
> before allocating a new a and copying all the data. We could blindly
> double the size of a each time we need to make more room; at the
> worst, our programs would use about twice as much memory. But we might
> be able to do better...
>
>
> What we need is an inexpensive way of predicting that a vector may be
> subject to repeated expansion.
>
> We could use static analysis of the program text: Every time we see an
> _expression_ of the form a←a,... we could choose the "double the size
> before copying" technique. We might even strengthen the test by
> confirming that the assignment appears inside a loop.
>
>
> Alternatively, we might somehow look at access patterns. If the
> variable is infrequently updated (this assumes that we can have a
> timestamp associated with the variable's header), then we might take a
> chance and do minimal (i.e. just enough space for the new element)
> reallocation. On the other hand, if we're updating that variable
> frequently (as we'd do in a tight loop), then it'd make sense to apply
> the "double the size before copying" technique.
>
>
> That's about all I can say for now about the double and copy parts.
> What about trim?
>
>
> Let's assume that we'd like to do better than to potentially double
> the runtime size of our GNU APL programs. What techniques might we use
> to mitigate that memory growth.
>
>
> We might set an upper bound on "overhead" memory allocation. (In this
> case, the overhead is the allocated-but-unused space consumed by
> reserving unused space in variables that might be subject to growth.
> That seems like an attractive approach until you consider that
> whatever threshold you select at which to stop reserving extra space,
> you could reach that threshold before hitting the one part of your
> program that's really going to benefit from the optimization. We
> really need to make this work without having to tweak interpreter
> parameters on a per-program and per-run basis.
>
>
> One possible approach might be to "garbage-collect" reserved space
> upon some trigger condition. (Maybe memory pressure, maybe time, maybe
> something else...) The downside is that this can fragment the heap. On
> the other hand, we might get away with it thanks to VM and locality of
> access; it may take some experiments to work out the best approach.
>
>
> Another possible approach might be to dynamically prioritize which
> variables are subject to reservation of extra space upon expansion,
> based upon frequency of updates or how recently the variable was
> updated. Again, experimentation may be necessary.
>
>
> Finally, the same "double the size before copying" technique should be
> applied to array assignments (e.g. a[m]←a[m],x).
>
>
>
>
> Appendix
>
>
> The attached test.apl file contains two functions: test1 and test2.
> (I'm tempted to break into Dr. Seuss rhyme, but I'll refrain...)
>
>
> The test1 function expands a vector by one element each time an
> element is added.
>
> The test2 function doubles the size of the vector each time the
> current allocation fills up.
>
>
> For small N, you'd be hard pressed to notice a difference between the
> runtime of test1 and test2. For example, try
>
>
> test1 5
>
> test2 5
>
> test1 50
>
> test2 50
>
> ... etc
>
>
>
> At some point, you'll see a dramatic difference. On my machine,
> there's a noticeable difference at 5000. At 50000, the difference is
> stunning.
>
>
> --
> "The secret to creativity is knowing how to hide your sources."
> Albert Einstein
>
>
> http://soundcloud.com/davidlamkins
> http://reverbnation.com/lamkins
> http://reverbnation.com/lcw
> http://lamkins-guitar.com/
> http://lamkins.net/
> http://successful-lisp.com/

From:	Elias Mårtenson
Subject:	Re: [Bug-apl] Some thoughts regarding optimization of memory allocation
Date:	Tue, 1 Jul 2014 14:22:01 +0800