bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: `wait -n` returns 127 when it shouldn't


From: Robert Elz
Subject: Re: `wait -n` returns 127 when it shouldn't
Date: Wed, 17 May 2023 22:52:23 +0700

    Date:        Wed, 17 May 2023 17:23:21 +1000
    From:        Martin D Kealey <martin@kurahaupo.gen.nz>
    Message-ID:  
<CAN_U6MWEhJth58AFWbbvFXUD7eSuJjhdBc9Mw4+dGiC0zOm3-g@mail.gmail.com>


  | I suspect putting "local" in a loop is doing something strange.

"local" is an executable statement, not a declaration (shell really
has none of the latter) - every time it is executed it creates a new
local variable (which remains until the function exits, there are no
local scope rules in shell either).

That should make no difference to this code though, and the difference
you report likely hints at the source of the problem.

The code is written weirdly however, this sequence

        code=0; wait -n || code=$?

could just be

        wait -n; code=$?

(the "local" that might be there makes no difference, or
shouldn't, to the execution semantics).

Getting status==127 out of the waitjobs function should be impossible,
as it starts out being 0, and is only changed to $code if $code!=127
so if that ever happens, there looks to be a bug somewhere.

oguzismailuysal@gmail.com said:
  | There is no guarantee that `wait -n' will report the status of `true',  the
  | shell may acquire the status of `false' first.

That should be irrelevant, waitjobs() has a loop that explicitly waits
upon wait -n returning 127 (which it does not return to the caller, or
should not) which should mean that there are no children remaining.

Further, as long as waitjobs wait -n call actually reaps the exit from
false, it should always return with status==1 (the exit status from false).
Since false & true should both always be running in the bg when waitjobs
is called, the exit status from false should always (fairly quickly, since
it doesn't run for very long) be obtained, causing code==1 and hence status==1
(after which status will never be altered again as it isn't touched if
code==0 or code==127 which should be the only other 2 returns from wait -n).

I modified the script to get rid of the (()) usage and replace that with
the similar [ ] code which made no difference at all when executed under
bash, it still ends the outer loop, reasonably quickly.

But then I could run the script using the NetBSD shell, where it (seems to)
run forever (ie: it is still running - but forever hasn't been reached yet).

I think there is a bug, probably some race condition in bash with the jobs
table, causing the "false" job to get missed sometimes when running this code.
That allows status to remain 0, and the outer look to break, and the script
to terminate.

Mostly likely the use of "local" in the loop which caused the difference that
Martin noticed alters the timing somewhat to affect the race results.

kre




reply via email to

[Prev in Thread] Current Thread [Next in Thread]