Hi,
On 2 Dec 2014, at 15:49, Carlo de Falco <address@hidden> wrote:
In any case to go into core the function needs to adhere better to the coding
standards, do I have
write access to your bitbucket repo to apply the changes? otherwise I'll just
send you a diff.
Attached is my modified version of levenshtein.cc,
apart from some formatting the main changes are:
- add a copyright notice suitable for including in Octave core
- better check for consistency of input
- use "octave_idx_type" instead of "unsigned int" (less efficient but more
portable)
- faster access to array elements
- add texinfo docstring
with this changes the function seems to run a bit faster on my system:
%% with levenshtein.cc
tic, opts = odeset ('AbsTol', 1e-5, ...
'NormControl', 'on', ...
'MaxNewtonIterations', 1e4, ...
'MaxOrder', 4, ...
'NewtonTol', 1e-2', ...
'NonNegative', [1 3 5], ...
'RelTol', 1e-5, ...
'InitialStep', .1, ...
'Refine', 1, ...
'Vectorized', 'on'); toc
Elapsed time is 0.241296 seconds.
%% with levenshtein_new.cc
tic, opts = odeset ('AbsTol', 1e-5, ...
'NormControl', 'on', ...
'MaxNewtonIterations', 1e4, ...
'MaxOrder', 4, ...
'NewtonTol', 1e-2', ...
'NonNegative', [1 3 5], ...
'RelTol', 1e-5, ...
'InitialStep', .1, ...
'Refine', 1, ...
'Vectorized', 'on'); toc
Elapsed time is 0.0917881 seconds.
I'm not sure these numbers are reliable though
as the test is quite fast anyway.
As I wrote off-list optimization of levenstein.cc
does not make much sense as the main bottleneck is
elsewhere, probably in this loop in fuzzy_logic.m :
%# loop on every field of the list
for i=1:1:fields_nb
%# if the list is a cell_array of strings
if(iscellstr(string_set))
string2 = deblank(string_set{i}); % removing spaces at the end
else
%# if the list is a vector of strings
string2 = deblank(string_set(i,:)); % removing spaces at the end
end
%# determining the distance by a call to levenshtein function
values(i) = levenshtein_new (lower(string1),lower(string2),minimus); % not
case sensitive
minimus = min(minimus,values(i)); % updating the upper_bound to speedup
the computation
end
modifying levenshtein.cc to work with cell arrays of strings would probably
improve this.
I would suggest to re-run the comparison with my version of levenshtein.cc
to see if any further speedup is actually needed before spending any effort
there
though.