Re: [head] wished an option to continue consuming the input after the sp

coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [head] wished an option to continue consuming the input after the sp

From:	Bob Proulx
Subject:	Re: [head] wished an option to continue consuming the input after the specified number of lines has been read
Date:	Tue, 16 Oct 2012 12:25:07 -0600
User-agent:	Mutt/1.5.21 (2010-09-15)

Thibault LE PAUL wrote:
> I wasn't clear enough.
> My goal was to do different things on the first lines and the last
> lines of same input, without using storage, thus using piped
> processes.

Depending upon what you want to do I would do something like this
using sed to do the difference to either part.

  $ seq 1 10 | sed '1,3s/^/head /;7,10s/^/tail /'
  head 1
  head 2
  head 3
  4
  5
  6
  tail 7
  tail 8
  tail 9
  tail 10

Or without printing the skipped lines:

  $ seq 1 10 | sed -n '1,3s/^/head /p;7,9s/^/tail /p'
  head 1
  head 2
  head 3
  tail 7
  tail 8
  tail 9

Or awk:

  $ seq 1 10 | awk 'NR<=3{print "head ",$0} NR>7{print "tail ",$0}'
  head  1
  head  2
  head  3
  tail  8
  tail  9
  tail  10

Or perl, python, ruby for complex tasks where I wanted to write
multiple line subroutines.  I would write all of the code out in a
subroutine and then call the subroutine on the appropriate line.

> Then I used tee, to fork the pipe into two processes.

That is okay.  But manually using multiple processes always comes with
it the problem of manually using multple processes. :-)

> Opposite to that, my original way that fails :

I cannot recreate a failure.

> rm /tmp/fifo1
> mkfifo /tmp/fifo1
> cat /tmp/fifo1|head -n2|sed 's/^/head&/'&
> tee /tmp/fifo1|tail -n2|sed 's/^/tail&/'

I assume that any amount of input to the tee can be used?  Can I
simply 'echo foo > /tmp/fifo1' and trigger your test case?  Please
say what input must be used.  If you don't say then we won't know.

What input are you providing to tee?  For use in this test case I
assume a few lines of input larger than 2+2=4 lines.  I will use the
command 'seq 1 7' to generate easy repeatable input.  I also like
spaces between debug strings so will add some spaces.

  seq 1 7 | tee /tmp/fifo1|tail -n2|sed 's/^/tail &/'

> That way, the /tmp/fifo1 fifo propagates SIGPIPE ahead to tee as
> soon as head has finished,

What?  There is a misunderstanding at this point.  This statement does
not make sense.

SIGPIPE occurs when a process writes to a closed pipe.  It is sent
from the kernel to the writing process.  The default action of SIGPIPE
is terminate the process.  See 'man 7 signal' for more details.

The 'tee' process is writing to the pipe.  When the last reader on the
pipe closes (usually by exiting) then all future writers will receive
a sigpipe signal which will terminate them.  This is normal behavior.

So, yes, tee is not able to read all of the input and write all of the
output to all of the output pipes from it.  But that is expected given
that one of the readers has exited.

> cat /tmp/fifo1|head -n2|sed 's/^/head&/'&

That extra 'cat' process is going to confuse things.  It will buffer
input and write buffered output.  This will reblock the data in
confusing ways.  I prefer to remove it.  It isn't needed.

  head -n2 < /tmp/fifo1 | sed 's/^/head &/'&

But perhaps this is simply a smaller example from the larger problem
and the cat represents some other process?

> rm /tmp/fifo1
> mkfifo /tmp/fifo1
> cat /tmp/fifo1|head -n2|sed 's/^/head&/'&
> tee /tmp/fifo1|tail -n2|sed 's/^/tail&/'

It is easier to debug this by avoiding the backgrounding and running
this test in three terminal windows.  In one read from the pipe with
the "head" section.  In two run the tee section.  In three send input
to the fifo.  Doing so will make it more visible when the processes
are running and when they are exiting.  Doing so will show that the
head command pipeline is reading two lines and emitting them followed
by the tail task emitting two lines them.  But running them as you
have shown produces this output:

  $ head -n2 < fifo | sed 's/^/head &/' &
  [1] 14641
  $ seq 1 9 | tee fifo | tail -n2 | sed 's/^/tail &/'
  head 1
  head 2
  tail 8
  tail 9
  [1]+  Done                    head -n2 < fifo | sed 's/^/head &/'
  $ 

> then tee stops, and tail doesn't read the expected last lines,
> instead just the lines before tee aborts and EOF is read on
> pipe. The effect is observable on large input, like
> /usr/share/mysql/errmsg-utf8.txt

Yes.  I did this:

  $ head -n2 < fifo | sed 's/^/head &/' &
  [1] 14641
  $ tee fifo < /usr/share/mysql/errmsg-utf8.txt | tail -n2 | sed 's/^/tail &/'

And I could see that tee exited due to the write on the fifo finishing
before the write to stdout and so the tail did not get all of the
file.  I consider that a normal behavior.  Yes, reading all of the
input and discarding it in the head process will allow tee to write
all of the output.  But that is a lot of extra data writing that is
wasteful and unused and simply thrown away and therefore I would avoid
doing it that way.  Doing tee to two asynchronous parallel processes
that may independently exit at different times.  Therefore I
immediately think of one of the earlier solutions I posted above that
processed the input on the fly.  It is so much simpler.

Also since the background process is asynchronous the order of emitted
output isn't specified.  It is possible that the background process
would be scheduled later (kernel process scheduling) and then the
output of the two processes might appear in a different order.  It is
tickling a lot of possible problems.  Best to those avoid entirely.

The use of the term "abort" would usually mean 'man 3 abort', the
abort() system call.  That is the action that happens from various
signals.  But I think you are using it casually simpling meaning that
the program is exiting.  Read the man page for 'man 3 abort' and the
signals that cause an abort() to happen in 'man 7 signals' and then
please avoid using that word when we aren't talking about that event
so that it doesn't confuse us.  :-)

> Also tried tee -i to ignore interrupts, but it is not the purpose of
> this option I suppose. No effect in our case.

The tee -i option only ignores SIGINT meant to avoid Control-C
interrupts.

Whew!  So where are we?  I advise avoiding the background process
approach, for this particular case at least, as it isn't needed.  It
opens a large box of potential and real problems.

Bob

[Prev in Thread]

Current Thread

[Next in Thread]

[head] wished an option to continue consuming the input after the specified number of lines has been read, Thibault LE PAUL, 2012/10/14
- Re: [head] wished an option to continue consuming the input after the specified number of lines has been read, Bob Proulx, 2012/10/14
  - Message not available
    - Re: [head] wished an option to continue consuming the input after the specified number of lines has been read, Bob Proulx, 2012/10/15
    - Re: [head] wished an option to continue consuming the input after the specified number of lines has been read, Thibault LE PAUL, 2012/10/16
    - Re: [head] wished an option to continue consuming the input after the specified number of lines has been read, Bob Proulx <=
    - Re: [head] wished an option to continue consuming the input after the specified number of lines has been read, Thibault LE PAUL, 2012/10/16
    - Re: [head] wished an option to continue consuming the input after the specified number of lines has been read, Thibault LE PAUL, 2012/10/16
    - Re: [head] wished an option to continue consuming the input after the specified number of lines has been read, Bob Proulx, 2012/10/17

Prev by Date: Re: Make mv work better with SELinux.
Next by Date: Re: [head] wished an option to continue consuming the input after the specified number of lines has been read
Previous by thread: Re: [head] wished an option to continue consuming the input after the specified number of lines has been read
Next by thread: Re: [head] wished an option to continue consuming the input after the specified number of lines has been read
Index(es):
- Date
- Thread