help-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-bash] When pipes fail (and when not)


From: Bob Proulx
Subject: Re: [Help-bash] When pipes fail (and when not)
Date: Sat, 24 Nov 2018 18:12:45 -0700
User-agent: Mutt/1.10.1 (2018-07-13)

Paul-Jürgen Wagner wrote:
> I have the following script:
> 
> for i in {0..9}; do echo $i; sleep 1; done | tee foo | dd bs=1 count=10

Looks good.  The sleep pushes the race condition such that dd almost
always exits first (on a heavily load system this might not be true as
it is not guaranteed but is heavily biased that way).  Not all
operating systems behave that way.  IIRC the old MS-DOS implemented
pipes through temporary files and all of the first would run to
conclusion before the second and then the third in the pipeline.  It
wasn't a multitasking system.  Just as an example.

This means that after dd exits upon the next write that tee makes it
will receive SIGPIPE and exit with a non-zero exit code and
WIFSIGNALED(wstatus) (see man 2 wait) will indicate it exited due to a
signal.

Since tee has exited then upon the next write the echo will exit.
Since echo is a shell builtin the for shell loop will exit.  If that
were /bin/echo or other non-builtin then that external would upon the
next write receive SIGPIPE and exit due to the signal and also
WIFSIGNALED(wstatus) will indicate this.

In general in an I/O filter pipeline all of the processes to the right
of the first command to exit will read an EOF from their input and
therefore exit normally.  All of the processes to the left will
receive a SIGPIPE upon their next write and exit WIFSIGNALED.

  A | B | C | D | E

If C exits then D and E receive EOF when they next read from their
inputs.  Upon their next writes B and then A will receive SIGPIPE and
exit.

Because I/O filter programs read and write continuously this is a good
design paradigm for them.  They will run and exit together as a set.
However if a process, say A in the above, is a long running cpu
intensive program that only writes output infrequently then it may
continue running and using resources for an indefinite time determined
by what the process is doing before it writes output and receives
SIGPIPE and exits.

If A is doing something for two hours before writing its next output
then it will still continue running for the next two hours even though
its output reading process has already exited.  Let me emphasize that
it is not until the *next* write that I/O will trigger the SIGPIPE to
be delivered to terminate the writing process.

A somewhat involved example to show this:

  for i in {0..9}; do { /bin/echo $i || { echo "echo failed: rc=$?"; break; } 
1>&2 ;}; sleep 2; done | dd bs=1 count=10 status=none
  0
  1
  2
  3
  4
  echo failed: rc=141

And remember the documentation for the shell is:

       The return value of a simple command is its exit status, or 128+n if
       the command is terminated by signal n.

Therefore since 141 is > 128 we know the /bin/echo was terminated by a
signal and 141-128=13 therefore it was terminated with signal 13
SIGPIPE.  We catch that in the || { ... ;} error handling portion,
print the exit code here and break out of the loop.  If we didn't
break out of it then the loop would continue for all ten loops and
print out the failure and rc code another four times for five times
total.

  for i in {0..9}; do { /bin/echo $i || { echo "echo failed: rc=$?"; } 1>&2 ;}; 
sleep 2; done | dd bs=1 count=10 status=none
  0
  1
  2
  3
  4
  echo failed: rc=141
  echo failed: rc=141
  echo failed: rc=141
  echo failed: rc=141
  echo failed: rc=141

Interestingly if we use the bash builtin echo then no diagnostic is
printed.  The shell knows it has caught a signal however and breaks
out of the loop itself.  The lack of a diagnostic is arguably a bug.
However dash behaves exactly the same way.  Therefore I would not be
in a hurry to change anything as scripts almost certainly depend upon
the current quiet behavior.

[[ksh behaves worse leaving the for loop running in the background
after having logged the termination of the command line and printing
the prompt!  ksh behaves strangely in this area and feels quite
buggy.]]

> But when I run this in a GNURoot on an android tablet (bash
> 4.3.30(1)), I get
> 
> 0
> 1
> 2
> 3
> 4
> 5
> 6
> 7
> 8
> 9
> 
> in file foo and

By this I presume that 'tee' did not get the SIGPIPE and did not exit.
Again using the somewhat complex test example to ignore SIGPIPE.

  man bash

    trap [-lp] [[arg] sigspec ...]

      If arg is the null string the signal specified by each sigspec
      is ignored by the shell and by the commands it invokes.

Therefore 'trap "" PIPE' ignores the signal.  Running this in a bash
-c subprocess avoids mungling the currently running command line
shell's trap state.  Otherwise I would need to reset the handler or
exit that shell to get back to a sane state.

  bash -c 'trap "" PIPE; for i in {0..9}; do { /bin/echo $i || { echo "echo 
failed: rc=$?"; break; } 1>&2 ;}; sleep 2; done | dd bs=1 count=10 status=none'
  0
  1
  2
  3
  4
  /bin/echo: write error: Broken pipe
  echo failed: rc=1

Which matches what you are seeing:

> 0
> 1
> 2
> 3
> 4
> tee: standard output: Broken pipe
> tee: write error

This confirms that tee did not get a SIGPIPE as instead after the
point where it normally would have it called write on the pipe and
the write on a closed pipe returned EPIPE.

  man 2 write

       EPIPE  fd is connected to a pipe or socket whose reading end is closed.
              When this happens the writing process will also receive a
              SIGPIPE signal.  (Thus, the write return value is seen only if
              the program catches, blocks or ignores this signal.)

That additional parenthetical is a new addition in my memory and I
think a very good one.  Because one would only ever see EPIPE if the
environment is trapping and ignoring SIGPIPE which is something that
they should never do.  (Never say never.  There is undoubtedly special
cases where ignoring SIGPIPE is useful or required.  But as a general
statement one should never do it.  Ignoring SIGPIPE causes people to
see "Broken pipe" errors.)

Therefore I conclude that your Android environment is mistakenly
installing a SIG_IGN handler for SIGPIPE in the environment.

  man 7 signal

       A child created via fork(2) inherits a copy of its parent's signal
       dispositions.  During an execve(2), the dispositions of handled signals
       are reset to the default; the dispositions of ignored signals are left
       unchanged.

> on the terminal.  Apparently the broken pipe does not cause the whole pipe
> to quit.  Could anyone explain why?  My script relies on the last command
> being able to terminate the whole pipe.  Is there a way to ensure this
> behaviour on GNURoot (android)?

Not having debugged this completely to root cause I can't say for
certain but I feel confident that the root cause of this problem is
something in the parent environment is ignoring SIGPIPE.  This is
being inherited by all of the later children processes.  That's bad.
Find that and fix it and it should all work normally.

For what it is worth I tried this on Android using Termux and
received the same result.  I feel confident that the parent
environment ignoring SIGPIPE is the problem.

> P.S.: I noticed that with GNURoot on android, process substitution does not
> work, probably because /dev can not be written?  Is that related to the
> problem above?

No.  I don't think so.  That would be a different problem.  And
strictly speaking that isn't a portable operation anyway.

Again a very complicated topic.  I hope I didn't screw up something in
my analysis of it.  If I did however someone please call it out and
improve upon the answering of it.

Bob

P.S. Using {0..9} as in "for i in {0..9}" is really not a good way to
do this.  It expands to be the full set.

  for i in 0 1 2 3 4 5 6 7 8 9

For 10 items this is okay.  But people get used to doing that and then
do it for 10 zillion items.  That eats up a *LOT* of memory.

  for i in {0..99999999}

In bash it is better to simply use a real for loop.

  for ((i=1; i<=10000000; i++)); do

And next I am of the opinion that counting numbers should be whole
counting numbers starting at 1 not 0 and proceeding on from there.
But that is a different topic.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]