octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #39834] vertcat() very slow compared to horzca


From: anonymous
Subject: [Octave-bug-tracker] [bug #39834] vertcat() very slow compared to horzcat()
Date: Fri, 23 Aug 2013 17:38:35 +0000
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/536.30.1 (KHTML, like Gecko) Version/6.0.5 Safari/536.30.1

Follow-up Comment #1, bug #39834 (project octave):

On the one hand, I would point out that this is an inherent property of how
horzcat and vertcat work at a physical level.  Matrices are stored "column at
a time" in memory.  Thus, the horzcat operation is completed quite easily by
concatenating the N chunks of data, and setting the right array dimensions. 
vertcat is inherently more complicated because it requires splicing columns
together in order to achieve the desired result.  As such, this performance
time difference is also exhibited in MATLAB.

That said, I found this interesting result when run in matlab:

    >> N=1e4;
    >> x = repmat({[1:N]'},N,1);   
    >> tic; y2 = horzcat(x{:}); toc
    Elapsed time is 0.213414 seconds.
    >> x = repmat({[1:N]},N,1);
    >> tic; y1 = vertcat(x{:}); toc
    Elapsed time is 1.277119 seconds.
    >> tic; x=cellfun(@(xx)xx',x,'UniformOutput',0); y1 = horzcat(x{:})'; toc
    Elapsed time is 0.424479 seconds.

In other words, not only do we get much better time performance using horzcat,
but it is sufficiently better that in this particular case, it is more
performant to transpose, use horzcat, and transpose again, rather than a
simple vertcat.  However, the following example demonstrates that this is not
true in cases where the first transpose operation is slower:

    >> x = repmat({repmat([1:N],10,1)},N/10,1);                              
    >> clear y1;
    >> tic; y1 = vertcat(x{:}); toc
    Elapsed time is 0.318404 seconds.
    >> clear y1
    >> tic; x=cellfun(@(xx)xx',x,'UniformOutput',0); y1 = horzcat(x{:})'; toc
    Elapsed time is 0.642182 seconds.

Interestingly, the analogous case with just horzcat by itself (i.e. to produce
y2) is not appreciably faster in this case than it was before.  The conclusion
we reach is that the time required for horzcat is essentially the time
required to allocate and copy the necessary memory, whereas vertcat may
require some extra time for the control operations that get the memory in the
right places.

It might be possible to optimize these operations further, but as you can see,
there are lots of cases and lots of trade-offs.  For example, even adding a
heuristic to check the sizes of the matrices to try to optimize the examples
above would significantly penalize a case in which vertcat needed to be called
many times.  Seems to me not to be worth the development time, but I'm not the
one who gets to make the call.


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?39834>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]