octave-patch-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-patch-tracker] [patch #8140] Speed up importdata() ASCII CSV pro


From: Dan Sebald
Subject: [Octave-patch-tracker] [patch #8140] Speed up importdata() ASCII CSV processing using dlmread() as core
Date: Sun, 25 Aug 2013 17:06:00 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:18.0) Gecko/20100101 Firefox/18.0 SeaMonkey/2.15

Follow-up Comment #7, patch #8140 (project octave):

A lot of small changes from the previous patch:

* The double ## appears more prevalent in the script files, so I went with
that.  (My preference may be for single quotes.)

* I added some more tests to confirm that automatic detection of delimiter is
working.

* I changed one of the tests (Header text) to replicate the text data as
column headers:


%!test
%! ## Header text
%! A.data = [3.1 -7.2 0; 0.012 6.5 128];
%! A.textdata = {"This is a header row."; ...
%!               "this row does not contain any data, but the next one
does."};
%! A.colheaders = A.textdata (2);


I think that is right, but not sure.

* I changed one of the tests (Column headers, only last row is returned in
colheaders) to have tabs between the ASCII characters of the textdata:


%!test
%! ## Column headers, only last row is returned in colheaders
%! A.data = [3.1 -7.2 0; 0.012 6.5 128];
%! A.textdata = {"Label1tLabel2tLabel3";
%!               "col1tcol2tcol3"};


I think that is correct, at least what I would expect.

* I changed one of the tests (Row headers) so that the text data matches the
row headers:


%!test
%! ## Row headers
%! A.data = [3.1 -7.2 0; 0.012 6.5 128];
%! A.textdata = {"row1"; "row2"};
%! A.rowheaders = A.textdata;


because that seems consistent to me with behavior for the colheaders scenario.
 Also note above that I changed the orientation of cell entries to be a column
rather than a row because that makes more sense to me.  I also made the
alteration that the value of h, i.e., number of HEADER_ROWS, is expected to be
0


%! assert (h, 0);


because there are no header rows in the sample file.  If that is supposed to
be more generally just HEADERS, then we'll have to change the documentation. 
One further note about the row headers approach is that it uses the slower
file access method after dlmread() is called, but in all likelihood there
aren't going to be big data files having thousands or tens of thousands of row
headers.  (What use would they be?).

* I programmed importdata to agree with the (Missing values) test.  It's an
easy cleanup at the end of the function:


  # Final cleanup to satisfy output configuration  
  if (all (cellfun ("isempty", output.textdata)))
    output = output.data;
  elseif (! isempty (output.rowheaders) && ! isempty (output.colheaders))
    output = struct ('data', {output.data}, 'textdata', {output.textdata});
  endif


but somehow I'd hope/wish that isn't the desired result.  It's enough that
someone using the routine has to test for the existence of the field names
"textdata", "colheaders", "rowheaders", to create a general data importing
scheme, and with this configuration it means the user also must test whether
the returned output is a matrix or a structure.  (Probably one of those cases
where things were added over time and backward compatibility was preserved.)

With those changes, it is only the exponential test and the last test (CR for
line breaks) that fails, and we'll leave that one active as a reminder to
change dlmread().


(file #28935)
    _______________________________________________________

Additional Item Attachment:

File name: octave-importdata_rework-2013aug25.patch Size:13 KB


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/patch/?8140>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]