[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Reading large data text file
From: |
Martijn |
Subject: |
Re: Reading large data text file |
Date: |
Fri, 22 Jul 2011 01:01:31 +0200 |
Can you find out whether reading is limited by the cpu (parsing) or disk
I/O (reading single lines)? In the later case reading larger chunks of
data into a string buffer, and splitting the string in lines might
reduce disk I/O overhead.
Martijn
On Thu, 2011-07-21 at 15:48 -0700, J Stasko wrote:
> I have a fairly large data file which I can read using the
> following .m file. However, it takes 40 times longer to read than by
> using csvread, which sometimes can take up to an hour.
>
>
> CSVREAD is nice, but it does not parse correctly the date and time
> information at the beginning of the line. How do I do this a
> different way, or how can I rewrite this in C as a module?
>
>
> Thanks!
>
>
> function t,x = readhist(filename)
>
>
> tic()
>
>
> fid = fopen(filename,"r");
> if (feof(fid))
> "File empty"
> fclose(fid)
> return
> endif
>
>
> text_line = fgetl(fid);
> u = strsplit(text_line,",");
> u(1,1) = strcat(u(1,1),u(1,2));
> t = [u(1,1),u(1,3:end)];
>
>
> columns = size(t)(2);
>
>
>
>
> line_number = 1;
> x_line=[];
> x=[];
> first_time = gmtime(0);
>
>
> while (!feof(fid) )
> text_line = fgetl(fid);
> w = strsplit(text_line,",");
> date_time = strcat(w(1,1),w(1,2));
> [line_time_s, nchar] = strptime(date_time,"20%y/%m/%d%H:%M:%S:");
> % line_time_s.usec = sscanf(x(nchar+1,nchar+2),"%d")*10000;
>
>
> if (first_time.year==70)
> first_time = line_time_s;
> endif
> line_time_sec = mktime(line_time_s) - mktime(first_time);
>
>
> x_line(1) = line_time_sec;
> for i = 3:columns
> x_line(i-1) = sscanf(cell2mat(w(1,i)),"%f");
> endfor
>
>
> x(line_number,:) = x_line;
> line_number = line_number+1;
>
>
> endwhile
>
>
> timetaken = toc();
> disp(line_number);
> disp(timetaken);
>
>
> fclose(fid);
> return
>
>
> endfunction
>
>
> _______________________________________________
> Help-octave mailing list
> address@hidden
> https://mailman.cae.wisc.edu/listinfo/help-octave