Re: accelerating sscanf ?

octave-maintainers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: accelerating sscanf ?

From:	Daniel J Sebald
Subject:	Re: accelerating sscanf ?
Date:	Thu, 22 Mar 2012 11:50:18 -0500
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111108 Fedora/3.1.16-1.fc14 Thunderbird/3.1.16

On 03/22/2012 10:28 AM, CdeMills wrote:

Hello,
when reading big files with my dataframe package, most of the time is spent
in converting string to double. The steps are:
1) the whole file is cut into lines
2) each line is then split using the field separator and stored in a cell
array
3) the conversion is performed as
  the_line = cellfun (@(x) sscanf (x, "%f", locales), dummy, 'UniformOutput',
false);

This is to say that sscanf is called for each field. During the call, a
stream is created, new locales are set, and so on. Some functions working on
strings accept cells of strings as input. Would it be OK to have sscanf also
accept cell array as first member ? The algorithm will then be:
1) create a istringstream, put the right locale on it. Create a cell array
for the output result.
2) for each entry in args(0)
    - verify that it's a string
    - put its value into the istringstream
    - scan it, and store  the result in the output cell array

The issue I have is that files with 7500 lines of 12 fields take more than
120 seconds to be parsed. If the number of calls is reduced by a factor of
10 at the interpreter level, the speedup would be worth a try ? What do you
think about it ?


Pascal,

What you describe sounds very similar to the issue that we've beendiscussing concerning bin2dec, i.e., better performance when workingwith strings inside cells. 7500 lines with 12 fields is what I wouldconsider "database" application. Once the data is in the cells, Octavecan do a lot of powerful things to analyze that data. However, itappears getting large data sets or transforming them is somewhatinefficient.

In any case, I'd like to point you to an alternative you might try.There is a string function called strsplit which can be pretty nice.Here's an example:


charstr = "these, are, fields\nseparated, by, commas";
lines = strsplit(charstr, "\n")
for i=1:length(lines)
  C(i,:) = strsplit(lines{i}, ',');
end
C

If one brings in the whole file as a hunk of characters then usesstrsplit, that can be efficient. The only problem--I don't know theformat of your data--is having text strings which have the delimitercharacter inside.

Dan

[Prev in Thread]

Current Thread

[Next in Thread]

accelerating sscanf ?, CdeMills, 2012/03/22
- Re: accelerating sscanf ?, Daniel J Sebald <=
  - Re: accelerating sscanf ?, CdeMills, 2012/03/22

Prev by Date: Re: Cleaning up the Savannah access list
Next by Date: Re: Fwd: [Pkg-octave-devel] Popcon stats for the DOG packages
Previous by thread: accelerating sscanf ?
Next by thread: Re: accelerating sscanf ?
Index(es):
- Date
- Thread