bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Performance observations while using getline: reading fro


From: Aharon Robbins
Subject: Re: [bug-gawk] Performance observations while using getline: reading from a pipe vs. using a coprocess
Date: Mon, 08 Apr 2013 20:44:59 +0300
User-agent: Heirloom mailx 12.5 6/20/10

Hi Hermann.

As noted by others, in the pipe case, you are closing and reopening
the pipe for *every* input record.  This means creating a new
process for every input record, whereas for the co-process case, you
are doing the close at the end, so you only create one process and you
continuosly feed it data. 

I suspect that if you change the pipe case to do the close at the
end you will see more reasonable performance.

Thanks,

Arnold

> Date: Sun, 07 Apr 2013 21:05:16 -0300
> From: Hermann Peifer <address@hidden>
> To: address@hidden
> Subject: [bug-gawk] Performance observations while using getline: reading
>  from a pipe vs. using a coprocess
>
> Hi,
>
> I made the below performance observations which I thought would be worth 
> noting down and sending to you. However, I might be simply stating the 
> obvious.
>
> The context:
> I am processing some GPX data, where I want to make the Geod utility 
> from the GeographicLib library [0] calculate the distance between 
> coordinates lat1,lon1 and lat2,lon2
>
> The observation:
> When using getline to read from a pipe, as in [1], the processing of 
> 50000 records of sample data is more than 60 times slower compared to 
> doing basically the same distance calculation via a coprocess, see [2]. 
> I am using gawk from git on a MacBook. I also tested with gawk 3.1.5 and 
> 3.1.8 which show the same behaviour.
>
> As far as I can see: The close(cmd) slows the data processing down. 
> Maybe this behaviour is worth mentioning in the manual.
>
> Not sure if this is of any relevance, but when using valgrind, each 
> execution of close(cmd) triggers this message:
>
> UNKNOWN task message [id 3403, to mach_task_self(), reply 0x2903]
>
> Regards, Hermann
>
>
> [0] http://sourceforge.net/projects/geographiclib/
>
> [1]
>
> awk 'BEGIN{ while (++x <= 50000) print 
> rand()*90,rand()*180,rand()*-90,rand()*-180}' > testdata
>
> ==> pipe.awk <==
> # Geod will be used for distance calculations
> BEGIN { str = "Geod -i --input-string " }
>
> {
>       cmd = str "'" $0 "'"
>
>       if ((cmd | getline) > 0)
>               print $0
>       close(cmd)
> }
>
> $ time awk -f pipe.awk testdata > out.pipe
>
> real  3m25.636s
> user  1m10.757s
> sys   1m40.159s
>
> [2]
>
> ==> coprocess.awk <==
> # Geod will be used for distance calculations
> BEGIN { cmd = "Geod -i" }
>
> {
>       print $0 |& cmd
>
>       if ((cmd |& getline) > 0)
>               print $0
> }
>
> END { close(cmd) }
>
> $ time awk -f coprocess.awk testdata > out.coprocess
>
> real  0m3.037s
> user  0m2.470s
> sys   0m0.459s
>
> $ diff out.pipe out.coprocess
> $



reply via email to

[Prev in Thread] Current Thread [Next in Thread]