[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Octave-bug-tracker] [bug #36372] Improved ranks.m included:
From: |
anonymous |
Subject: |
[Octave-bug-tracker] [bug #36372] Improved ranks.m included: |
Date: |
Wed, 02 May 2012 16:50:37 +0000 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.16) Gecko/20120421 Firefox/3.5.16 |
URL:
<http://savannah.gnu.org/bugs/?36372>
Summary: Improved ranks.m included:
Project: GNU Octave
Submitted by: None
Submitted on: Wed 02 May 2012 04:50:36 PM UTC
Category: None
Severity: 3 - Normal
Priority: 5 - Normal
Item Group: Performance
Status: None
Assigned to: None
Originator Name: Dave Goel
Originator Email: address@hidden
Open/Closed: Open
Discussion Lock: Any
Release: dev
Operating System: GNU/Linux
_______________________________________________________
Details:
Hi.
I believe that ranks.m (Though I originally tested from version 3.2.4,
the dev. version's algo is basically unchanged) is very inefficient
except when the elements of the input matrix are distinct.
For example, for this input, a=round(10*rand(100123,100));
my alternative finished computation in 1.4s while the current function
hasn't yet returned an answer in the last 15-20 minutes.
For smaller inputs, I checked carefully, and the outputs do agree.
---
Additionally, the alternative also implements multiple ranking schemes:
fractional (default), standard competition ranking, modified competition
ranking, and finally, ordinal ranking; examples of all of which are seen
here: http://en.wikipedia.org/wiki/Ranking
----
Though I used a completely new algorithm, I managed to pen the
alternative as a derivative of the current function. Thus, attaching
(a) diff; (b) as the complete file.
The "diff -u" is included below. Below that follows the entire file. Thanks
very much.
----
1a2
> ## Copyright (C) 2012 Dave Goel
6,7c7,8
< ## under the terms of the GNU General Public License as published by
< ## the Free Software Foundation; either version 3 of the License, or (at
---
> ## under the terms of the GNU General Public License as published by the
> ## Free Software Foundation; either version 3 of the License, or (at
10,13c11,14
< ## Octave is distributed in the hope that it will be useful, but
< ## WITHOUT ANY WARRANTY; without even the implied warranty of
< ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
< ## General Public License for more details.
---
> ## Octave is distributed in the hope that it will be useful, but WITHOUT
> ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
> ## for more details.
18c19
<
---
> ##
20,27c21,40
< ## @deftypefn {Function File} {} ranks (@var{x}, @var{dim})
< ## Return the ranks of @var{x} along the first non-singleton dimension
< ## adjusted for ties. If the optional argument @var{dim} is
< ## given, operate along this dimension.
< ## @seealso{spearman, kendall}
< ## @end deftypefn
<
< ## Author: KH <address@hidden>
---
> ##
> ## @deftypefn {Function File} {} ranks (@var{x}, @var{dim}) Return the
> ## ranks of @var{x} along the first non-singleton dimension adjust for
> ## ties. If the optional argument @var{dim} is given, operate along this
> ## dimension. Third argument @var{rtype} donates the type of ranking: 0
> ## or "fractional" for fractional, which is the default, 1 or
> ## "competition" for competiton ranking, 2 or "modified" for modified
> ## competition ranking, 3 or "ordinal" for ordinal ranking and 4 or
> ## "dense" for dense ranking. When using the third argument, you can
> ## supply us '' for @var{dim} to have it auto-computed. @end deftypefn
> ##
> ## This function returns a result identical to GNU Octave's built-in
> ## ranks.m but is much faster (For example, 1.4s vs. ''no result in
> ## 33mins'' when the input was a=round(10*rand(100123,100));
> ## Additionally, we also handle several ranking schemes. This function
> ## has been submitted to octave's mailing list and may (or may not)
> ## become part of GNU Octave.
> ##
> ##
> ## Authors: KH <address@hidden>, Dave Goel <address@hidden>
30,32d42
< ## This code was rather ugly, since it didn't use sort due to the
< ## fact of how to deal with ties. Now it does use sort and its
< ## even uglier!!! At least it handles NDArrays..
34c44
< function y = ranks (x, dim)
---
> function y = ranks (x, dim, rtype)
36c46
< if (nargin != 1 && nargin != 2)
---
> if (nargin < 1);
40,41c50,51
< if (! (isnumeric (x) || islogical (x)))
< error ("ranks: X must be a numeric vector or matrix");
---
> if nargin<3||isempty(rtype);
> rtype=0;
43c53
<
---
>
46c56
< if (nargin != 2)
---
> if (nargin<2)||isempty(dim);
48c58,64
< (dim = find (sz > 1, 1)) || (dim = 1);
---
> dim = 1;
> while (dim < nd + 1 && sz(dim) == 1)
> dim = dim + 1;
> endwhile
> if (dim > nd)
> dim = 1;
> endif
50,52c66,69
< if (!(isscalar (dim) && dim == fix (dim))
< || !(1 <= dim && dim <= nd))
< error ("ranks: DIM must be an integer and a valid dimension");
---
> if (! (isscalar (dim) && dim == round (dim))
> && dim > 0
> && dim < (nd + 1))
> error ("ranks: dim must be an integer and valid dimension");
66,80c83,105
< sz = size (x);
< infvec = -Inf ([1, sz(2 : end)]);
< [xs, xi] = sort (x);
< eq_el = find (diff ([xs; infvec]) == 0);
< if (isempty (eq_el))
< [eq_el, y] = sort (xi);
< else
< runs = setdiff (eq_el, eq_el+1);
< len = diff (find (diff ([Inf; eq_el; -Inf]) != 1)) + 1;
< [eq_el, y] = sort (xi);
< for i = 1 : length(runs)
< y (xi (runs (i) + [0:(len(i)-1)]) + floor (runs (i) ./ sz(1))
< * sz(1)) = eq_el(runs(i)) + (len(i) - 1) / 2;
< endfor
< endif
---
> sz=size(x);
> [sx ids]=sort(x); ## sx is sorted x.
>
> lin=cumsum(ones(size(x)),1); ## A linearly increasing array.
>
> switch rtype;
> case {0,"fractional"};
> lin=(_competition(lin,sx,sz)+_modified(lin,sx,sz))/2;
> case {1,"competition"};
> lin=_competition(lin,sx,sz);
> case {2,"modified"};
> lin=_modified(lin,sx,sz);
> case {3,"ordinal"};
> ## lin=lin;
> otherwise
> rtype
> error("Illegal value of rtype specified.");
> endswitch
> y=NaN(size(lin));
> ## If input was a vector, we could have simply done this:
> ## y(ids)=lin;
> y(_ids(ids,sz))=lin;
>
84a110
> endfunction
85a112,123
> function idf=_ids(ids,sz);
> oo=ones(sz);
> allids=[{ids-1}];
> nn=numel(sz);
> for ii=2:nn;
> allids=[allids,{cumsum(oo,ii)-1}];
> endfor
> idf=allids{end};
> for jj=(nn-1):-1:1;
> idf=idf*sz(jj)+allids{jj};
> endfor
> idf+=1;
88a127,148
> function linnew=_competition (lin,sx,sz)
> ## Stop increasing lin when sx does not increase. Same as before
> ## otherwise.
> infvec = -Inf * ones ([1, sz(2 : end)]);
> fnewp= find(diff([infvec;sx]));
> linnew=zeros(size(lin));
> linnew(fnewp)=lin(fnewp);
> linnew=cummax(linnew);
> endfunction
>
>
> function linnew=_modified (lin,sx,sz)
> ## Traverse lin backwards. Stop decreasing it when sx doesn't
> ## decrease.
> infvec = Inf * ones ([1, sz(2 : end)]);
> fnewp= find(diff([sx;infvec]));
> linnew=Inf(size(lin));
> linnew(fnewp)=lin(fnewp);
> linnew=flipdim(cummin(flipdim(linnew,1)),1);
> endfunction
>
>
104d163
<
====================================================
## Copyright (C) 1995-2012 Kurt Hornik
## Copyright (C) 2012 Dave Goel
##
## This file is part of Octave.
##
## Octave is free software; you can redistribute it and/or modify it
## under the terms of the GNU General Public License as published by the
## Free Software Foundation; either version 3 of the License, or (at
## your option) any later version.
##
## Octave is distributed in the hope that it will be useful, but WITHOUT
## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
## for more details.
##
## You should have received a copy of the GNU General Public License
## along with Octave; see the file COPYING. If not, see
## <http://www.gnu.org/licenses/>.
##
## -*- texinfo -*-
##
## @deftypefn {Function File} {} ranks (@var{x}, @var{dim}) Return the
## ranks of @var{x} along the first non-singleton dimension adjust for
## ties. If the optional argument @var{dim} is given, operate along this
## dimension. Third argument @var{rtype} donates the type of ranking: 0
## or "fractional" for fractional, which is the default, 1 or
## "competition" for competiton ranking, 2 or "modified" for modified
## competition ranking, 3 or "ordinal" for ordinal ranking and 4 or
## "dense" for dense ranking. When using the third argument, you can
## supply us '' for @var{dim} to have it auto-computed. @end deftypefn
##
## This function returns a result identical to GNU Octave's built-in
## ranks.m but is much faster (For example, 1.4s vs. ''no result in
## 33mins'' when the input was a=round(10*rand(100123,100));
## Additionally, we also handle several ranking schemes. This function
## has been submitted to octave's mailing list and may (or may not)
## become part of GNU Octave.
##
##
## Authors: KH <address@hidden>, Dave Goel <address@hidden>
## Description: Compute ranks
function y = ranks (x, dim, rtype)
if (nargin < 1);
print_usage ();
endif
if nargin<3||isempty(rtype);
rtype=0;
endif
nd = ndims (x);
sz = size (x);
if (nargin<2)||isempty(dim);
## Find the first non-singleton dimension.
dim = 1;
while (dim < nd + 1 && sz(dim) == 1)
dim = dim + 1;
endwhile
if (dim > nd)
dim = 1;
endif
else
if (! (isscalar (dim) && dim == round (dim))
&& dim > 0
&& dim < (nd + 1))
error ("ranks: dim must be an integer and valid dimension");
endif
endif
if (sz(dim) == 1)
y = ones(sz);
else
## The algorithm works only on dim = 1, so permute if necesary.
if (dim != 1)
perm = [1 : nd];
perm(1) = dim;
perm(dim) = 1;
x = permute (x, perm);
endif
sz=size(x);
[sx ids]=sort(x); ## sx is sorted x.
lin=cumsum(ones(size(x)),1); ## A linearly increasing array.
switch rtype;
case {0,"fractional"};
lin=(_competition(lin,sx,sz)+_modified(lin,sx,sz))/2;
case {1,"competition"};
lin=_competition(lin,sx,sz);
case {2,"modified"};
lin=_modified(lin,sx,sz);
case {3,"ordinal"};
## lin=lin;
otherwise
rtype
error("Illegal value of rtype specified.");
endswitch
y=NaN(size(lin));
## If input was a vector, we could have simply done this:
## y(ids)=lin;
y(_ids(ids,sz))=lin;
if (dim != 1)
y = permute (y, perm);
endif
endif
endfunction
function idf=_ids(ids,sz);
oo=ones(sz);
allids=[{ids-1}];
nn=numel(sz);
for ii=2:nn;
allids=[allids,{cumsum(oo,ii)-1}];
endfor
idf=allids{end};
for jj=(nn-1):-1:1;
idf=idf*sz(jj)+allids{jj};
endfor
idf+=1;
endfunction
function linnew=_competition (lin,sx,sz)
## Stop increasing lin when sx does not increase. Same as before
## otherwise.
infvec = -Inf * ones ([1, sz(2 : end)]);
fnewp= find(diff([infvec;sx]));
linnew=zeros(size(lin));
linnew(fnewp)=lin(fnewp);
linnew=cummax(linnew);
endfunction
function linnew=_modified (lin,sx,sz)
## Traverse lin backwards. Stop decreasing it when sx doesn't
## decrease.
infvec = Inf * ones ([1, sz(2 : end)]);
fnewp= find(diff([sx;infvec]));
linnew=Inf(size(lin));
linnew(fnewp)=lin(fnewp);
linnew=flipdim(cummin(flipdim(linnew,1)),1);
endfunction
%!assert (ranks (1:2:10), 1:5)
%!assert (ranks (10:-2:1), 5:-1:1)
%!assert (ranks ([2, 1, 2, 4]), [2.5, 1, 2.5, 4])
%!assert (ranks (ones (1, 5)), 3*ones (1, 5))
%!assert (ranks (1e6*ones (1, 5)), 3*ones (1, 5))
%!assert (ranks (rand (1, 5), 1), ones (1, 5))
%% Test input validation
%!error ranks ()
%!error ranks (1, 2, 3)
%!error ranks ({1, 2})
%!error ranks (['A'; 'B'])
%!error ranks (1, 1.5)
%!error ranks (1, 0)
%!error ranks (1, 3)
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/bugs/?36372>
_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
- [Octave-bug-tracker] [bug #36372] Improved ranks.m included:,
anonymous <=
- [Octave-bug-tracker] [bug #36372] Improved ranks.m included:, Jordi Gutiérrez Hermoso, 2012/05/02
- [Octave-bug-tracker] [bug #36372] Improved ranks.m included:, Jordi Gutiérrez Hermoso, 2012/05/08
- [Octave-bug-tracker] [bug #36372] Improved ranks.m included:, anonymous, 2012/05/08
- [Octave-bug-tracker] [bug #36372] Improved ranks.m included:, Jordi Gutiérrez Hermoso, 2012/05/08
- [Octave-bug-tracker] [bug #36372] Improved ranks.m included:, anonymous, 2012/05/08