On Wed, Dec 19, 2018 at
11:09:51AM 0800, Rik wrote:
Recently,
sliding windows of statistics functions (movXXX) were added to
Octave. All of these functions depend on a base function
movfun. As part
of implementing movfun I used the profiler to examine the
performance and I
found a hot spot responsible for 70% of the run time.
Unfortunately, I
couldn't see any way to improve things without jumping out of
the Octave
language in to C++. I'm hoping someone else might look at
this and have an
idea.
The code is below
 Code Start 
## Apply "shrink" boundary conditions
## Function is not applied to any window elements outside the
original data.
function y = shrink_bc (fcn, x, idxp, win, wlen, odim)
N = length (x);
idx = idxp + win;
tf = (idx > 0) & (idx <= N); # idx inside
boundaries
n = length (idxp);
y = zeros (n, odim);
## FIXME: This nested for loop accounts for 70% of running
time.
## Given that "shrink" is the default Endpoint value
this
## code needs to be reworked.
for i = 1:n
k = idx(tf(:,i),i);
y(i,:) = fcn (x(k));
endfor
endfunction
 Code End 
In my case, the following conditions are a suitable example
+verbatim+
fcn = @mean;
x = randi (1e4, [1000, 1]);
idxp = 1:25;
win = (25:25).';
wlen = [51, 51];
odim = 1;
verbatim
Indexing in Octave is insanely fast, but this is a weird case
because it is
not simply linear indexing [x(5)], nor is it logical indexing
[x(x < 1)]. It is a hybrid where the indices are linear (as
would be returned from
find()) but only some of them are valid so they need to be
masked with a
logical value.
When I run this code repeatedly in a benchmark it requires
~3.0
milliseconds, but in the overall code it is called 2000 times
so the
function is taking ~6 seconds which feels long.
In this example, the value of k for i = 1 is 1:26, for i = 2,
1:27, etc., etc.
I tried recoding using cellfun
idx = cellfun (@(n) (1:n), num2cell ((wlen(1) 
idxp(end)):(wlen(end)1)),
'uniformoutput', false);
y = cellfun (@(i) fcn (x(i)), idx);
but this takes ~4.5 milliseconds to run.
Anybody have any good tricks to try?
If not, it makes me wonder whether a C++ function that did
this mixed
indexing could be useful. Another feature which is not
implemented for
movfun is "includenan"/"omitnan". The core need is similar to
that above
in that I have a function handle, some linear indices, and
some logical
indices (based on isnan()).
Rik
Two ideas: first, if this movfun is not intended to be called
with arbitrary usersupplied functions, but only with mean,
median, std, sum, var, max, min to provide the mov*functions
introduced in Matlab R2016a (the first two comments in https://blogs.mathworks.com/loren/2016/04/15/movingalong/
seem to imply that Matlab does not provide movfun), then I would
change the architecture of movfun to do a common preprocessing,
but then a switchstatement, which in the case of shrink
boundary conditions expands the data by
+verbatim+
data_=[fill*ones(wlen,1);data;fill*ones(wlen,1)];
dat=reshape(data_(wlen+(1:N)+(wlen:wlen)'),N,2*wlen+1);
verbatim
where fill is inf for max, +inf for min, and zero for all else
(apart from median), and then use the fact that you can feed dat
to all the functions above and specify that you want to do your
operation along the first dimension (which is the default
anyways). For mean you have to do a renormalization of the first
and last wlen entries to undo the fact that you averaged zero to
it, and for std and var you have to do this in two rounds, first
computing the mean of the values as indicated above, and then
summing the deviations. mad and prod, which are not provided by
Matlab, obviously also can be done in this way. For median, it
seems that you will need a for loop for the first and last wlen
entries (because there is no appropriate fill apart from NaN),
but probably N will typically be much larger than wlen in any
case. I am quite sure that for not too large wlen (say 10) this
will be orders of magnitude faster than the present
implementation. Note that Matlab has for all these functions the
argument nanflag (i.e., sum([1 NaN],'omitnan') should give 1),
which seems to constitute as of now an unreported(?) Matlab
incompatibility bug. If octave provides this argument also, the
logic would become much easier, because you could always use nan
as filler.
Further, for nan handling in mov* you also have to replace every
nan by fill, and normalize it out in the case of mean.
If you really want to be fast, there are of course more
efficient methods: computing a cumsum, shifting by wlen*2+1 and
subtracting from itself will give you movsum, from which you can
derive mean, var and std. This has complexity O(N) for K<N,
compared to O(N*K) for the case above. But there can be issues
of overflow for integer types and lost precision for doubles.
min, max and and median can only be done in O(N*log(K)), e.g. by
a heapbased approach, but this is of course only possible in
compiled code.
Thanks for looking at this so carefully. movfun is an
Octavespecific enhancement over Matlab intended to apply to any
user function. As such, I prefer not to introduce a large switch
statement decoding which of many possible functions called it
(movmax, movmin, movmad, ...). However, I don't see any problem
with preprocessing the options or data in the calling mfile. The
movmax function, in its entirety, is
function y = movmax (x, wlen, varargin)
if (nargin < 2)
print_usage ();
endif
y = movfun (@max, x, wlen, __parse_movargs__ ("movmax",
varargin{:}){:});
endfunction
There is already an Endpoints option to fill with a value.
Benchmarking with a 1000x1000 matrix shows a 4X improvement.
octave:4> tic; y = movmin (x, 51, 'Endpoints', "shrink"); toc
Elapsed time is 0.989651 seconds.
octave:5> tic; y = movmin (x, 51, 'Endpoints', Inf); toc
Elapsed time is 0.262563 seconds.
So there seems a very simple path forward for movmax (Inf), movmin
(Inf), movprod (1), movsum (0) where the mfile checks for the
"Endpoints" argument and if it is not found, or found to be
"shrink", it substitutes a specific fill value.
Because movfun operates on any general function, it can't rely on
the underlying function having an 'omitnan' option. This suggests
that a C++ implementation would still be useful because it could
handle movmad, movmean, movstd, movvar, and movmedian as well as
'omitnan'.
I'll file a bug request for the improvement of the first four movXXX
functions.
Rik
