help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: dynamic allocation of cell arrays is sloooow


From: Keith Keith
Subject: Re: dynamic allocation of cell arrays is sloooow
Date: Wed, 16 Mar 2016 10:59:04 -0400



On Wed, Mar 16, 2016 at 9:35 AM, Francesco Potortì <address@hidden> wrote:
>On Wed, Mar 16, 2016 at 12:17 PM, Francesco Potortì <address@hidden> wrote:
>> Dynamic allocation of array-like data structures is inherently
>> inefficient, but sometimes it vastly simplifies coding, especially when
>> the total size of the data structure is unknown in advance.
>>
>> Octave does a good job at doing dynamic allocation of arrays, but is
>> apparently very inefficient when cell arrays or struct arrays are
>> involved.  Here is a simple benchmark:
>>
>>>> version
>> ans = 4.0.0
>>
>>>> a=[]; for esp=1:6; tic; for i=1:10^esp; a(i)=i; end; toc; end
>> Elapsed time is 0.000110149 seconds.
>> Elapsed time is 0.000464916 seconds.
>> Elapsed time is 0.00417209 seconds.
>> Elapsed time is 0.040483 seconds.
>> Elapsed time is 0.417366 seconds.
>> Elapsed time is 5.72097 seconds.
>>
>>>> a={}; for esp=1:6; tic; for i=1:10^esp; a{i}=i; end; toc; end
>> Elapsed time is 0.000128031 seconds.
>> Elapsed time is 0.000672817 seconds.
>> Elapsed time is 0.00595999 seconds.
>> Elapsed time is 0.063189 seconds.
>> Elapsed time is 1.49592 seconds.
>> Elapsed time is 145.996 seconds.
>>
>>>> a=struct("a",{}); for esp=1:6; tic; for i=1:10^esp; a(i).a=i; end; toc; end
>> Elapsed time is 0.00019598 seconds.
>> Elapsed time is 0.140211 seconds.
>> Elapsed time is 0.0078249 seconds.
>> Elapsed time is 0.0820489 seconds.
>> Elapsed time is 1.82587 seconds.
>> Elapsed time is 158.164 seconds.
>>
>> I find myself in the latter case: a struct array with around a million
>> entries.  The only reasonable way I found is to do excess initial
>> allocation and then delete the excess entries at the end.  Is this the
>> recommended way?
>
>Assuming you are forced to use cells, that is, shape/type of element
>sis inhomogeneous. I do not think there is a solution. Check the
>functions with names starting with "cell", maybe some of those help
>you.
>Your next chance would be to code some C++, but I am unsure what would
>be the benefit. it would be nice to see a benchmark.
>I am currently thinking to write a sort of acumcell function, but wont
>have time till end of April to actually do serious work there. Maybe
>that is also of your interest.

Thank you Pablo.

I don't know what is the internal representation of cells, but I assumed
it is an array of pointers to objects that can be dynamically expanded
essentially in the same way as an array of double is expanded, for
example by doubling the size each time the previous allocation does not
fit.  Clearly I am wrong here, and maybe there are good reasons why this
cannot be done as efficiently as with vectors.

My specific use case involves a jagged array, where the main dimension
is around some million entries (or possibly more) and the second
dimension varies from 1 to, say, 15 (unknown in advance), with an
average length of about 3.

--
Francesco Potortì (ricercatore)        Voice:  +39.050.621.3058
ISTI - Area della ricerca CNR          Mobile: +39.348.8283.107
via G. Moruzzi 1, I-56124 Pisa         Skype:  wnlabisti
(entrance 20, 1st floor, room C71)     Web:    http://fly.isti.cnr.it


_______________________________________________
Help-octave mailing list
address@hidden
https://lists.gnu.org/mailman/listinfo/help-octave

It's probably slower because you don't know how much memory to allocate.  If you allocate an array of 1M int32s you know you need 32MB.  You allocate the memory and you're done. If you allocate 1M cells you're probably going to need to allocate a heap that can vary in size.  Then you're going to have to allocate an array of pointers of size 32MB and point each one to a piece fo the heap which requires you to keep up with which parts of the heap you've allocated.  Then resize the heap allocations every time you make an assignment. Then if you allocated 4 bytes to a certain element and it becomes 8 bytes and you already allocated the adjacent memory you either have to move the data in its way or reallocate a 64bit spot in the head and delete the previous allocation.  You could allocate 64bits to each one by default but this doubles the size of your heap and a string longer than 8 characters makes you reallocate anyway.  This is just a guess.  I haven't looked at the code but it does show why an array of cells is harder to handle.  Any possible algorithm has to deal with these problems one way or another.  Dealing with this adds overhead. 

To answer your earlier question excess allocation is the suggested solution.  I remember reading about vector implementation in C++.  They said usually if you don't set a vector size then when you assign a certain number of elements to it, it allocates 2^n elements where n is an integer and 2^n is the first 2^n greater than the number of assignments you made.  If you assign more as you go then once you reach (2^n) + 1 it allocates 2^(n+1).  The book I was reading suggested setting the vector size to a number you know you won't exceed if it was at all possible. 

It's a memory foot print vs speed optimization.  If you have the memory to overallocate, you should definitely overallocate.  If you don't have the memory to overallocate then you have to make a judgement call on whether it is faster to overallocate to swap space or dynamically allocate.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]