bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: wait -n misses signaled subprocess


From: Chet Ramey
Subject: Re: wait -n misses signaled subprocess
Date: Sun, 28 Jan 2024 18:21:42 -0500
User-agent: Mozilla Thunderbird

On 1/22/24 11:30 AM, Steven Pelley wrote:

I've tried:
killing with SIGTERM and SIGALRM
killing from the test script, a subshell, and another terminal.  I
don't believe this is related to kill being a builtin.
enabling job control (set -m)
bash versions 4.4.12, 5.2.15, 5.2.21.  All linux arm64

You must have left `set -m' enabled in the version whose results you
posted, since you don't get non-interactive status notifications unless
you do.

Let's see if we can go through what happens. Part of it has to do with
notifications and when the shell removes jobs from the jobs table.

When the shell is interactive, and job control is enabled, it checks for
terminated background jobs, notifies the user about their status if
appropriate, and removes them from the jobs list -- bash removes a job
from the list when it's notified the user of its status -- when it goes
to read a new command, before printing the prompt. In a non-interactive
shell, it obviously doesn't print a prompt, but it does the same thing,
even the notification, before reading the next command.

When job control isn't enabled (usually in a non-interactive shell), the
shell doesn't notify users about terminated background jobs, but it still
removes dead jobs from the jobs list before reading the next command. It
cleans the jobs table of notified jobs at other times, too, to move dead
jobs out of the jobs list and keep it a manageable size.

The shell does keep a table of terminated background jobs that have been
removed from the jobs list, because POSIX says you have to keep track of
the last CHILD_MAX pids and make their exit statuses available to `wait'
(but see below).

Test script:
# change to test other signals
sig=TERM

echo "TEST: KILL PRIOR TO wait -n @${SECONDS}"
{ sleep 1; exit 1; } & > pid=$!

This ends up adding this to the jobs table as job 1. $pid is the pgrp
leader.

echo "kill -$sig $pid @${SECONDS}"
kill -$sig $pid

You kill that job, it terminates, the shell gets the SIGCHLD and waits
for it, marks it as dead in the jobs table, and goes to read the next
command. It doesn't matter whether this happens before the sleep or the
wait; the job gets removed as soon as the user is notified and moved to
the table of saved statuses. (If the shell isn't doing notifications,
the job just gets moved.)


sleep 2
wait -n $pid

When I run this, whether job control is enabled or not, I get an error
message about an unknown job, because `wait -n' doesn't look in the table
of saved statuses -- its job is to wait for `new' jobs to terminate, not
ones that have already been removed from the table. Maybe you're
redirecting stderr.

echo "wait -n $pid return code $? @${SECONDS} (BUG)"

The job isn't in the jobs table because you've already been notified about
it and it's not `new', you get the unknown job error status.

wait $pid > echo "wait $pid return code $? @${SECONDS}"

This works, because wait without -n looks in the table of saved statuses.


echo "TEST: KILL DURING wait -n @${SECONDS}"
{ sleep 2; exit 1; } &
pid=$!
{ sleep 1; echo "kill -$sig $pid @${SECONDS}"; kill -$sig $pid; } &

wait -n $pid

The shell doesn't get the SIGCHLD before running wait, so the job is still
in the jobs list.

echo "wait -n $pid return code $? @${SECONDS}"
wait $pid
echo "wait $pid return code $? @${SECONDS}"

And you get the same status here. Even though the `wait -n' removes the
job from the jobs list, the subsequent `wait' can still find it in the
table of saved exit statuses.



For which I get the following example output:
TEST: KILL PRIOR TO wait -n @0
kill -TERM 1384 @0
./test.sh: line 14:  1384 Terminated              { sleep 1; exit 1; }
wait -n 1384 return code 127 @2 (BUG)
wait 1384 return code 143 @2
TEST: KILL DURING wait -n @2
kill -TERM 1402 @3
./test.sh: line 25:  1402 Terminated              { sleep 2; exit 1; }
wait -n 1402 return code 143 @3
wait 1402 return code 143 @3

I expect the line ending (BUG) to indicate a return code of 143.

It might, if `wait -n' looked for already-notified jobs in the table of
saved exit statuses, but it doesn't. Should it, even if the user has
already been notified of the status of that job?

Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet@case.edu    http://tiswww.cwru.edu/~chet/

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]