bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Problem with printing 5000 lines to a coprocess


From: Andrew J. Schorr
Subject: Re: [bug-gawk] Problem with printing 5000 lines to a coprocess
Date: Sat, 20 Dec 2014 12:23:27 -0500
User-agent: Mutt/1.5.23 (2014-03-12)

Hi Hermann,

On Sat, Dec 20, 2014 at 01:41:03PM -0200, Hermann Peifer wrote:
> I am wondering if there is some buffer setting or similar to avoid the
> problem. Maybe this is not a gawk issue in the first place, as printing
> 100000 lines to sort works fine.

What O/S are you using?  The following discussion is assuming Linux...

The kernel buffers a certain amount of data for a socket or pipe file
descriptor.  From a quick inspection of gawk code, it appears that a pair of
pipes is used to communicate between gawk and the coprocess (as opposed to a
socketpair).  From linux "man 7 pipe":

   Pipe capacity
       A  pipe  has  a limited capacity.  If the pipe is full, then a write(2)
       will block or fail, depending on whether the  O_NONBLOCK  flag  is  set
       (see  below).   Different implementations have different limits for the
       pipe capacity.  Applications should not rely on a particular  capacity:
       an  application  should  be designed so that a reading process consumes
       data as soon as it is available, so that a  writing  process  does  not
       remain blocked.

       In Linux versions before 2.6.11, the capacity of a pipe was the same as
       the system page size (e.g., 4096 bytes on i386).  Since  Linux  2.6.11,
       the pipe capacity is 65536 bytes.

With sockets, you can change the buffer size using a system call like this:
   setsockopt(fd, SOL_SOCKET, SO_(RCV|SND)BUF, &new_buf_size, 
sizeof(new_buf_size))
I'm not sure if this can be configured for a pipe.

Anyway, the problem here is that gawk is writing data to your coprocess.  Your
coprocess is presumably reading the data, processing it, and writing data back
to gawk.  But in your program design, gawk will not read any data back from the
coprocess until it is finished writing all the outgoing data to the coprocess.
As a result, the coprocess is blocking in a write call when the kernel buffers
containing the data written by the coprocess to gawk fill up.  Then, since the
coprocess has blocked, the buffers from gawk to the coprocess will also fill
up, causing gawk to block.

I can't remember if this is documented in the gawk manual.  I see some
discussion of potential problems related to not flushing output from the
coprocess.  Maybe a discussion of this type of deadlock should be added if it's
not there already.

Regards,
Andy



reply via email to

[Prev in Thread] Current Thread [Next in Thread]