bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: wait -n misses signaled subprocess


From: alex xmb sw ratchev
Subject: Re: wait -n misses signaled subprocess
Date: Thu, 1 Feb 2024 09:09:27 +0100

On Wed, Jan 31, 2024, 20:36 Robert Elz <kre@munnari.oz.au> wrote:

>     Date:        Wed, 31 Jan 2024 11:35:57 -0500
>     From:        Chet Ramey <chet.ramey@case.edu>
>     Message-ID:  <1e50aa99-8d53-4cdf-ba5e-6aaf3ccc6767@case.edu>
>
>   | Not quite. `new' in this sense is the opposite of `anything in the
> past'
>   | as Dale described it -- already notified and removed from the jobs
> list.
>
> I guess the part about bash that I am not understanding here is how the
> "already notified" works.   To me there are just two ways for that, either
> the user has done a "wait" which has collected that pid already (either
> without -n, and no pid args, or with pid args and one of those is the pid
> in question) or with -n and the pid in question was the one whose status
> was returned, or the user/script did the jobs command (or jobs -l) and the
> job in question was shown as completed.
>

i say additional datastructure for the saving purpose ..

Is there some other way?
>
>   | Half the problem here is that bash aggressively marks dead jobs as
> being
>   | notified in non-interactive shells without job control enabled, and
> moves
>   | them out of the jobs table.
>
> That might be more than half the problem, it might be the entire problem.
>
>   | If you use wait -n without arguments, you probably don't care,
>
> No you do, that just means any of the children ... the script could make
> a list of all of them and supply that list, but if the list is just going
> to contain all the existing children, why bother?    (With -n - and not
> exactly one pid arg, -p is generally going to be required, but that option
> has no bearing on which process is selected, or might be, which is the
> issue here).
>
>   | but if you
>   | do, or if you use wait -n with pid/job arguments (which you've
> presumably
>   | saved yourself) you're going to need slightly different semantics than
> we
>   | have now to answer that reliably. And that will probably need a new
> option.
>
> That's a pity, particularly since the current semantics don't seem to
> be useful in general.   Since the sole issue provoking that seems to be
> the wait over and over policy, rather than "wait once, and remove
> completely"
> perhaps rather than a new, but different, -n like option, a better idea
> would
> be a "only once" option (ie: if the option (-r (remove) or -c (cleanup) or
> -o
> (once only)) is set, then when the wait with that option returns status or,
> or waits until termination without returning status (in the not -n case,
> with
> no pid args, or many pid args) then the processes are completely deleted
> from
> everywhere in the shell.   Using that option would make a changed -n safe
> to use in loops.   If you do that, also add an option (maybe the upper case
> version of whatever is selected for that one, or just some other letter) to
> mean "don't wait" (kind of like wait(2) WNOWAIT) - which in default bash
> would
> just be a no-op (except in posix mode, apparently - whereas the -[cor]
> option
> would be a no-op in posix mode).
>
> If you were to do that, other shells could add the same (except in probably
> all of them, -[cor] would always be the default, and the other one would be
> the one which changes behaviour).
>
>   | And that's why I used `more': there are several differences, so which
>   | of those differences should we attempt to change?
>
> Just the one.
>
>   | > The one change that should be made is
>   | > to allow wait -n to collect processes/jobs that have already
> terminated.
>   |
>   | Yes, that's one of the things we're talking about. I don't have any
> problem
>   | with it, but should it take a new option to change those semantics?
>
> Good, though I think some more thought should go into that.   In another
> thread you said (paraphrasing) correctly, that scripts should not be
> relying upon bugs, and the current wait -n behaviour is a bug - that it
> might have been intentionally coded that way doesn't make it any less so.
> It isn't as if it was ever documented to work the way it does, or everyone
> would have known about it already.
>
>   | > Changing it to wait for all the listed pids
>   | It's never done that.
>   | We're not going to change the return value from wait.
>
> Good, I only mentioned those possibilities because your earlier
> message was unclear about what "more like wait without -n" meant.
>
>   | Yeah, but we're talking about bash here. It doesn't really matter what
>   | the Bourne shell did; there are likely plenty of scripts that assume
>   | the historical bash behavior.
>
> Really?   Why?   What's the point of collecting the status twice?
> It can't change in the meantime can it, once a process has done exit(N)
> its exit status should always be N, regardless of how often it is waited
> upon.
>
> [Aside: this should be obvious, but when one is collecting status changes,
> rather than just "terminated" status, then the pid isn't removed if it
> returns a "stopped" or "continued" status.]
>
>   | > I meant the distinction between processes
>   | > that the shell has already collected status for, and those for which
> it
>
>   | You're not the first to propose something like that, but I'm not going
> to
>   | be writing that code any time soon.
>
> Nor am I, if you go back to the message where I first mentioned it,
> which I can't locate at the minute, I am fairly sure I said that while
> it might help in this case, I doubt it is worth the effort.   Or something
> like that.
>
> Actually, found it eventually (this is quoting myself, earlier):
>   >> But as long as it is just a matter of cleaning up, and jobs works for
>   >> that, I don't currently see the need.
>
>   | It is, in fact, true in the current implementation, as long as the pid
>   | is in the jobs list.
>
> That caveat is the problem.
>
>   | It's always been true. If there is a job marked
>   | (internally, if you must) as dead for which the user has not yet
> received
>   | notification, wait -n returns it and marks it as notified (and deletes
>   | it from the jobs list).
>
> That part is good.
>
>   | Yes, that's one of the things we're talking about: whether wait -n
> should
>   | consider pids/jobs *not* in the jobs list, the way wait without -n
> does.
>   | That's about the only thing we're talking about changing here so far.
>
> Maybe a better discussion, and potential change, would be to whatever
> other that the use of the wait, or jobs, commands can result in a job
> moving out of the jobs list.   If there were nothing other than those,
> (and jobs list overflow or similar) then we'd be fine, and it seems to
> me now, no change to the -n operation would be needed.
>
>   | That hasn't actually been true with bash running in default mode for a
>   | very long time now. Bash has allowed multiple waits for the same pid
> for
>   | many years, whether or not you or I think it's a good idea or the
> correct
>   | semantics. Even if it was an accident of the implementation, and maybe
> you
>   | could say it was, we are stuck with it.
>
> Which is why I suggested an option (just above) to turn that misfeature
> off.
> Even better perhaps might be a bash shopt.
>
>   | It's ok, we got one.
>
> A kind of unlikely one.
>
> kre
>
>
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]