octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #58345] Emit error and don't parse scripts > 1


From: Rik
Subject: [Octave-bug-tracker] [bug #58345] Emit error and don't parse scripts > 1GB in size
Date: Tue, 12 May 2020 23:11:35 -0400 (EDT)
User-agent: Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko

Follow-up Comment #5, bug #58345 (project octave):

In previous versions of Octave textscan was an m-file, and hence slow.  In
newer versions textscan is written in C++ and might be fast enough for your
needs.  Similarly, fgetl is written in C++, but generally it is wrapped in a
for loop in an m-file and loops in interpreted languages are slow (unless
there is a JIT compiler).

I would look at storing the data in a uniform format so that you could take
advantage of some of the routines written in C++.

As an example, for 2-D matrices.  If you just write a text file that contains
numbers separated by spaces you can use 'load' to have the interpreter read in
all the values.

For example, I made this file and called it A.var (attached to the bug report
as well).


1 3.1415 2
-1e2 10 2.718


I can then do


octave:1> load A.var
octave:2> A
A =

     1.0000     3.1415     2.0000
  -100.0000    10.0000     2.7180



However, it turns out that even load() is ~3X slower than dlmread().

To test, I first created an array of 1GB in size


octave:13> sz = ceil (sqrt (1e9/8));
octave:14> x = rand (sz, sz);
octave:15> whos x
Variables visible from the current scope:

variables in scope: top scope

   Attr Name        Size                     Bytes  Class
   ==== ====        ====                     =====  ===== 
        x       11181x11181             1000118088  double


Then I wrote it out to a space-separated file.


octave:18> dlmwrite ('x.var', x, ' ');


The resulting file is 2.3GB in size.  Next, I tried reading it back in with
dlmread


octave:23> tic; x = dlmread ('x.var', ' '); toc
Elapsed time is 107.864 seconds.


That's okay, I guess.  The result was worse for a straight load.


octave:26> tic; load ('x.var'); toc
Elapsed time is 342.51 seconds.


Even with textscan now written in C++, it is far too slow.


octave:41> tic; c = textscan (fid, fmt); toc
Elapsed time is 1911.16 seconds.


Clearly, however, the winner is to use a binary format rather than a text
one.


octave:29> tic; save -binary x.bin x; toc
Elapsed time is 0.811615 seconds.
octave:30> clear x
octave:32> tic; load x.bin; toc
Elapsed time is 1.38077 seconds.


For this purpose fwrite()/fread() are just about as fast as well.


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?58345>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]