bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Problem with printing 5000 lines to a coprocess


From: Hermann Peifer
Subject: Re: [bug-gawk] Problem with printing 5000 lines to a coprocess
Date: Sat, 20 Dec 2014 20:24:58 -0200
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:24.0) Gecko/20100101 Thunderbird/24.6.0

On 2014-12-20 15:23, Andrew J. Schorr wrote:
Hi Hermann,

On Sat, Dec 20, 2014 at 01:41:03PM -0200, Hermann Peifer wrote:
I am wondering if there is some buffer setting or similar to avoid the
problem. Maybe this is not a gawk issue in the first place, as printing
100000 lines to sort works fine.

What O/S are you using?  The following discussion is assuming Linux...

Sorry, I should have mentioned: gawk from git master on Mac OS X 10.10.1

Thanks for the explanations around pipe capacity, etc. I will re-read them carefully and see what I can do.


Anyway, the problem here is that gawk is writing data to your coprocess.  Your
coprocess is presumably reading the data, processing it, and writing data back
to gawk.  But in your program design, gawk will not read any data back from the
coprocess until it is finished writing all the outgoing data to the coprocess.
As a result, the coprocess is blocking in a write call when the kernel buffers
containing the data written by the coprocess to gawk fill up.  Then, since the
coprocess has blocked, the buffers from gawk to the coprocess will also fill
up, causing gawk to block.

I can't remember if this is documented in the gawk manual.  I see some
discussion of potential problems related to not flushing output from the
coprocess.  Maybe a discussion of this type of deadlock should be added if it's
not there already.


My initial 2-way communication design was to alternate write and read operations line by line, as suggested in the manual:

  print data |& "subprogram"
  "subprogram" |& getline results

However, sending only 1 line to the coprocess is not enough to get a result back, which looks like the "opposite deadlock". The line-by-line approach only works like the below sample code, which is terribly slow, due to permanently opening and closing pipes:

for (i = 0; i < num; i++) {

  print v1, v1, v2, v2 |& command
  close(command, "to")

  if ((command |& getline) > 0)
    dist = $3
  close(command, "from")
}

So I changed the code to: print all, then read all, thereby running into the "limited pipe capacity" deadlock. Sigh.

Eventually, I went for the tempfile approach described in GAWK manual section 12.3. The manual states: "This works, but not elegantly." Hmm. The 2-way pipe might be elegant, but I don't seem to get it to work properly.

Hermann



reply via email to

[Prev in Thread] Current Thread [Next in Thread]