octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #60928] Performance of sort unexpectedly slow


From: Michael Leitner
Subject: [Octave-bug-tracker] [bug #60928] Performance of sort unexpectedly slow for DIM=2
Date: Sun, 15 Aug 2021 16:52:00 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Linux i686; rv:60.0) Gecko/20100101 Firefox/60.0

Follow-up Comment #9, bug #60928 (project octave):

First: Interestingly, in the email Rik's comment was not truncated. I give
here the part after "then" that belongs to comment #7:


dim = 1
dv = [3, 4, 5, 6];


When sorting over the first dimension the stride is 1 which means the
subsequent code takes a shortcut and can just grab the data from the source
matrix one value after another and then place the sorted data into the output
matrix one after another.  Cache efficiency makes two appearances here.

When not sorting over the first dimension, the correct data has to be gathered
from the source matrix, sorted, and then scattered to the destination matrix.
The code is complicated by the addition of NaN processing, but you can see how
the gather/scatter is working and how it could potentially be tremendously
inefficient.


         // gather and partition out NaNs.
          // FIXME: impact on integer types noticeable?
          octave_idx_type kl = 0;
          octave_idx_type ku = ns;
          for (octave_idx_type i = 0; i < ns; i++)
            {
              T tmp = ov[i*stride + offset];
              if (sort_isnan<T> (tmp))
                buf[--ku] = tmp;
              else
                buf[kl++] = tmp;
            }

          // sort.
          lsort.sort (buf, kl);

          if (ku < ns)
            {
              // NaNs are in reverse order
              std::reverse (buf + ku, buf + ns);
              if (mode == DESCENDING)
                std::rotate (buf, buf + ku, buf + ns);
            }

          // scatter.
          for (octave_idx_type i = 0; i < ns; i++)
            v[i*stride + offset] = buf[i];


Seems like this can be avoided by using equivalent of permute/ipermute since
those functions are also templated code in Array.cc.

    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?60928>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]