help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reading large data text file


From: Martijn
Subject: Re: Reading large data text file
Date: Fri, 22 Jul 2011 01:01:31 +0200

Can you find out whether reading is limited by the cpu (parsing) or disk
I/O (reading single lines)? In the later case reading larger chunks of
data into a string buffer, and splitting the string in lines might
reduce disk I/O overhead.

Martijn

On Thu, 2011-07-21 at 15:48 -0700, J Stasko wrote:
> I have a fairly large data file which I can read using the
> following .m file.  However, it takes 40 times longer to read than by
> using csvread, which sometimes can take up to an hour.
> 
> 
> CSVREAD is nice, but it does not parse correctly the date and time
> information at the beginning of the line.  How do I do this a
> different way, or how can I rewrite this in C as a module?
> 
> 
> Thanks!
> 
> 
> function t,x = readhist(filename)
> 
> 
> tic()
> 
> 
> fid = fopen(filename,"r");
> if (feof(fid))
>   "File empty"
>   fclose(fid)
>   return
> endif
> 
> 
> text_line = fgetl(fid);
> u = strsplit(text_line,",");
> u(1,1) = strcat(u(1,1),u(1,2));
> t = [u(1,1),u(1,3:end)];
> 
> 
> columns = size(t)(2);
> 
> 
> 
> 
> line_number = 1;
> x_line=[];
> x=[];
> first_time = gmtime(0);
> 
> 
> while (!feof(fid) )
>   text_line = fgetl(fid);
>   w = strsplit(text_line,",");
>   date_time = strcat(w(1,1),w(1,2));
>   [line_time_s, nchar] = strptime(date_time,"20%y/%m/%d%H:%M:%S:");
> %  line_time_s.usec = sscanf(x(nchar+1,nchar+2),"%d")*10000;
> 
>                                                        
>   if (first_time.year==70)
>     first_time = line_time_s;
>   endif
>   line_time_sec = mktime(line_time_s) - mktime(first_time);
> 
> 
>   x_line(1) = line_time_sec;
>   for i = 3:columns
>     x_line(i-1) = sscanf(cell2mat(w(1,i)),"%f");
>   endfor
> 
> 
> x(line_number,:) = x_line;
> line_number = line_number+1;
> 
> 
> endwhile
> 
> 
> timetaken = toc();
> disp(line_number);
> disp(timetaken);
> 
> 
> fclose(fid);
> return
> 
> 
> endfunction
> 
> 
> _______________________________________________
> Help-octave mailing list
> address@hidden
> https://mailman.cae.wisc.edu/listinfo/help-octave




reply via email to

[Prev in Thread] Current Thread [Next in Thread]