[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Why is sscanf() so slow?
From: |
Dirk Eddelbuettel |
Subject: |
Re: Why is sscanf() so slow? |
Date: |
Sat, 25 Jan 2003 15:24:14 -0600 |
User-agent: |
Mutt/1.3.28i |
On Sat, Jan 25, 2003 at 02:45:39PM -0600, stefan wrote:
> Dear Octaver's,
>
> first of all: Thanks for great software. I am using it now for about two
> years, especialy for displaying and adjusting measured data (comes from
> some measurement bus system to computer). The software which drives these
> devices almostly always produce tab- or comma-seperated data.
>
> For this I do:
[...]
> or very likely...
>
> For more than 1000 lines this takes *ages*. Is there a better way to do
> so or is it a slow implementation in octave? I would like to see this
> improved some day.
Please see below for the function aload.m from the octave-ci collection by
Kurt Horik et al; I used to use this a lot. It essentially pre-processes
the data first, and then uses a normal load (in ascii mode). I never quite
figure out why JWE didn't include it into Octave itself when he chose to
include other octave-ci functions. Anyway, there is a Debian package of
octave-ci, a tarball in Vienna, Austria. Paul Kienzel also has something
similar in octave-forge.
Hope this help, Dirk
## Copyright (C) 1996, 1997 Kurt Hornik
##
## This program is free software; you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 2, or (at your option)
## any later version.
##
## This program is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with this file. If not, write to the Free Software Foundation,
## 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
## x = aload (filename [, cw [, rw [, FS [, NA [, ignore_regexp]]]]])
## loads the flat ASCII data file `filename' into x.
##
## With the optional parameters cw and rw one can select the data
## columns (variables) and rows (observations) to load. Both cw and rw
## may be index vectors or Inf (default), meaning to load everything.
##
## With FS, one can specify the field separator in the data file as one
## would do in AWK. Default is " ".
##
## With NA, one can specify how unavailable data are represented in the
## data file, and how they should be loaded into Octave. The default is
## "NA/NaN", meaning that NA's should be converted to NaN's. (Note that
## this does not work yet.)
##
## Finally, ignore_regexp is an egrep regular expression specifying
## which lines in the data file should be ignored. The default is
## "^[\t]*(#|%|$)", meaning that empty lines and lines where # or % are
## the first non-whitespace characters are ignored.
##
## Note that rw selects the data line (observation) numbers and NOT the
## line numbers in the file!
##
## Note also that currently, only real numbers can be loaded.
## Author: KH <address@hidden>
## Description: Load from a flat ASCII data file
function x = aload (filename, cw, rw, FS, NA, ignore_regexp)
if ((nargin < 1) || (nargin > 6))
usage ("aload (filename, cw, rw, FS, NA, ignore_regexp)");
endif
if (nargin < 6)
ignore_regexp = "^[ \t]*(#|%|$)";
endif
if (nargin < 5)
NA = "NA/NaN";
endif
if (nargin < 4)
FS = " ";
endif
if (nargin < 3)
rw = Inf;
endif
if (nargin < 2)
cw = Inf;
endif
## maybe_do_more_sanity_checks ();
if !is_struct (stat (filename))
error (sprintf ("aload: File '%s' not found", filename));
endif
tmpfile = octave_tmp_file_name ();
system (["cat ", filename, " | ", ...
"egrep -ve \'", ignore_regexp, "\' | ", ...
"sed -e 's/", NA, "/g' > ", tmpfile]);
eval (system (["cat ", tmpfile, " | ", ...
"awk 'BEGIN { FS = \"", FS, "\" }; ", ...
"END { printf \"rf = %g; cf = %g;\", NR, NF }'"]));
if (cw == Inf)
cw = 1 : cf;
elseif (min (size (cw)) == 1)
cw = cw (find (cw <= cf));
else
error ("aload: cw must be a scalar or a vector");
endif
if (rw == Inf)
rw = 1 : rf;
elseif (min (size (rw)) == 1)
rw = rw (find (rw <= rf));
else
error ("aload: rw must be a scalar or a vector");
endif
loadfile = octave_tmp_file_name ();
fd = fopen (loadfile, "w");
fprintf (fd, "# name x\n# type: matrix\n");
fprintf (fd, "# rows: %g\n# columns: %g\n", length (rw), length (cw));
fclose (fd);
s = sprintf ("$%d", cw(1));
for i = 2 : length (cw);
s = sprintf ("%s, $%d", s, cw(i));
endfor
system (["cat ", tmpfile, " | ", ...
"awk 'BEGIN { FS = \"", FS, "\" }; { print ", s, " };' ", ...
" >> ", loadfile]);
eval (["load -force -ascii ", loadfile]);
x = x(rw, :);
system (sprintf ("rm -f %s %s", tmpfile, loadfile));
endfunction
> For now I do 'save -mat-binary %s values' to keep loading times short next
> time. Some cache algo around the above code.
>
> Any help is appreciated,
> address@hidden
>
>
>
> -------------------------------------------------------------
> Octave is freely available under the terms of the GNU GPL.
>
> Octave's home on the web: http://www.octave.org
> How to fund new projects: http://www.octave.org/funding.html
> Subscription information: http://www.octave.org/archive.html
> -------------------------------------------------------------
>
--
Prediction is very difficult, especially about the future.
-- Niels Bohr
-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.
Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------