bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Problem with printing 5000 lines to a coprocess


From: Andrew J. Schorr
Subject: Re: [bug-gawk] Problem with printing 5000 lines to a coprocess
Date: Sat, 20 Dec 2014 17:37:15 -0500
User-agent: Mutt/1.5.23 (2014-03-12)

On Sat, Dec 20, 2014 at 08:24:58PM -0200, Hermann Peifer wrote:
> On 2014-12-20 15:23, Andrew J. Schorr wrote:
> >What O/S are you using?  The following discussion is assuming Linux...
> 
> Sorry, I should have mentioned: gawk from git master on Mac OS X 10.10.1
> 
> Thanks for the explanations around pipe capacity, etc. I will re-read them
> carefully and see what I can do.

I'm sure Mac OS X will work much the same way.  Perhaps Windows may be
peculiar.

> My initial 2-way communication design was to alternate write and read
> operations line by line, as suggested in the manual:
> 
>   print data |& "subprogram"
>   "subprogram" |& getline results

That is the way I usually use a coprocess.  As long as you know that the
subprogram will return precisely one line for each line of input (and
flush its output immediately), this works well.  It can be slow due
to the lack of pipelining.

> So I changed the code to: print all, then read all, thereby running into the
> "limited pipe capacity" deadlock. Sigh.
> 
> Eventually, I went for the tempfile approach described in GAWK manual
> section 12.3. The manual states: "This works, but not elegantly." Hmm. The
> 2-way pipe might be elegant, but I don't seem to get it to work properly.

The other approach is to write out a batch of data, and then read a batch
of results.  In other words, instead of writing 1 line and then reading 1 line,
or writing the whole file and then reading all the output, try something
in between: write N lines of data at a time, and then read the output from that
batch.  I'm not sure what the best value of N is; you will have to experiment.
This should give much better performance than writing 1 line at a time.
You just need to make sure that N is small enough to avoid filling the
kernel buffer.

In the long run, we are working on adding to gawk the ability to run the
"select" system call.  Once select is available, there may be different ways to
solve this.  For example, you could use select to read returned data only when
it becomes available.  I am working on merging this code soon...

Of course, using a temporary file is also a good solution.  I often use that
approach.

Regards,
Andy



reply via email to

[Prev in Thread] Current Thread [Next in Thread]