Re: Examples of concurrent coproc usage?

bug-bash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Examples of concurrent coproc usage?

From:	Carl Edquist
Subject:	Re: Examples of concurrent coproc usage?
Date:	Thu, 14 Mar 2024 04:58:48 -0500 (CDT)

[My apologies up front for the length of this email. The short story is Iplayed around with the multi-coproc support: the fd closing seems to workfine to prevent deadlock, but I found one bug apparently introduced withmulti-coproc support, and one other coproc bug that is not new.]


On Mon, 11 Mar 2024, Zachary Santer wrote:

Was "RFE: enable buffering on null-terminated data"

On Mon, Mar 11, 2024 at 7:54 AM Carl Edquist <edquist@cs.wisc.edu> wrote:
(Kind of a side-note ... bash's limited coprocess handling was a longstanding annoyance for me in the past, to the point that I wrote a bashcoprocess management library to handle multiple active coprocess andgive convenient methods for interaction. Perhaps the trickiest bitabout multiple coprocesses open at once (which I suspect is the reasonsupport was never added to bash) is that you don't want the second andsubsequent coprocesses to inherit the pipe fds of prior opencoprocesses. This can result in deadlock if, for instance, you closeyour write end to coproc1, but coproc1 continues to wait for inputbecause coproc2 also has a copy of a write end of the pipe to coproc1'sinput. So you need to be smart about subsequent coprocesses firstclosing all fds associated with other coprocesses.
https://lists.gnu.org/archive/html/help-bash/2021-03/msg00296.html
https://lists.gnu.org/archive/html/help-bash/2021-04/msg00136.html

Oh hey! Look at that. Thanks for the links to this thread - I gave thema read (along with the old thread from 2011-04). I feel a little bad Imissed the 2021 discussion.

You're on the money, though there is a preprocessor directive you canbuild bash with that will allow it to handle multiple concurrentcoprocesses without complaining: MULTIPLE_COPROCS=1.

Who knew! Thanks for mentioning it. When I saw that "only one activecoprocess at a time" was _still_ listed in the bugs section in bash 5, Ifigured multiple coprocess support had just been abandoned. Chet, that'scool that you implemented it.

I kind of went all-out on my bash coprocess management library though(mostly back in 2014-2016) ... It's pretty feature-rich and pleasant touse -- to the point that I don't think there is any going-back to bash'sinternal coproc for me, even with multiple coprocess are support. Iimplemented it with shell functions, so it doesn't rely on compilinganything or the latest version of bash being present. (I even added bash3support for older systems.)

Chet Ramey's sticking point was that he hadn't seen coprocesses usedenough in the wild to satisfactorily test that his implementation did infact keep the coproc file descriptors out of subshells.

To be fair coproc is kind of a niche feature. But I think more peoplewould play with it if it were less awkward to use and if they felt free toexperiment with multiple coprocs.

By the way, I agree with the Chet's exact description of the problemshere:


    https://lists.gnu.org/archive/html/help-bash/2021-03/msg00282.html

The issue is separate from the stdio buffering discussion; the issue hereis with child processes (and I think not foreground subshells, butspecifically background processes, including coprocesses) inheriting theshell's fds that are open to pipes connected to an active coprocess.

Not getting a sigpipe/write failure results in a coprocess sitting aroundlonger than it ought to, but it's not obvious (to me) how this leads todeadlock, since the shell at least has closed its read end of the pipe tothat coprocess, so at least you aren't going to hang trying to read fromit.

On the other hand, a coprocess not seeing EOF will cause deadlock prettyreadily, especially if it processes all its input before producing output(as with wc, sort, sha1sum). Trying to read from the coprocess will hangindefinitely if the coprocess is still waiting for input, which is thecase if there is another copy of the write end of its read pipe opensomewhere.

If you've got examples you can direct him to, I'd really appreciate it.

[My original use cases for multiple coprocesses were (1) forprogrammatically interacting with multiple command-line database clientstogether, and (2) for talking to multiple interactive command-line gameengines (othello) to play each other.

Perl's IPC::Open2 works, too, but it's easier to experiment on the fly inbash.

And in general having the freedom to play with multiple coprocesses helpsmock up more complicated pipelines, or even webs of interconnectedprocesses.]


But you can create a deadlock without doing anything fancy.

Well, *without multi-coproc support*, here's a simple wc example; firstwith a single coproc:


        $ coproc WC { wc; }
        $ exec {WC[1]}>&-
        $ read -u ${WC[0]} X
        $ echo $X
        0 0 0

This works as expected.

But if you try it with a second coproc (again, without multi-coprocsupport), the second coproc will inherit copies of the shell's read andwrite pipe fds to the first coproc, and the read will hang (as describedabove), as the first coproc doesn't see EOF:


        $ coproc WC { wc; }
        $ coproc CAT { cat; }
        $ exec {WC[1]}>&-
        $ read -u ${WC[0]} X

        # HANGS


But, this can be observed even before attempting the read that hangs.

You can 'ps' to see the user shell (bash), the coprocs' shells (bash), andthe coprocs' commands (wc & cat). Then 'ls -l /proc/PID/fd/' to see whatthey have open:

- The user shell has its copies of the read & write fds open for bothcoprocs (as it should)

- The coproc commands (wc & cat) each have only a single read & write pipeopen, on fd 0 & 1 (as they should)

- The first coproc's shell (WC) has only a single read & write pipe open,on fd 0 & 1 (as it should)

- The second coproc's shell (CAT) has its own read & write pipes open, onfd 0 & 1 (good), but it also has a copy of the user shell's read & writepipe fds to the first coproc (WC) open (on fd 60 & 63 in this case, whichit inherited when forking from the user shell)

(And in general, latter coproc shells will have stray copies of the usershell's r/w ends from all previous coprocs.)

So, you can examine the situation after setting up coprocs, to see if allthe coproc-related processes have just two pipes open (on fd 0 & 1). Ifthis is the case, I think that suffices to convince me anyway that nodeadlocks related to stray open fds can happen. But if any of them hasother pipes open (inherited from the user shell), that indicates theproblem.

I tried compiling the latest bash with MULTIPLE_COPROCS=1 (version5.2.21(1)) to test out the multi-coproc support.

I tried standing up the above WC and CAT coprocs, together with someothers to check that the behavior looked ok for pipelines also (which Ithink was one of Chet's concerns)


        $ coproc WC { wc; }
        $ coproc CAT { cat; }
        $ coproc CAT3 { cat | cat | cat; }
        $ coproc CAT4 { cat | cat | cat | cat; }
        $ coproc CATX { cat ; }

And as far as the fd situation, everything checks out: the user shell hasfds open to all the coprocs, and the coproc shells & coproc commands(including all the cat's in the pipelines) have only a single read & writepipe open on fd 0 & 1. So, the multi-coproc code seems to be closing theshell's copies correctly.

[The examples are boring, but their point is just to investigate thestray-fd question.]



HOWEVER!!!

Unexpectedly, the new multi-coproc code seems to close the user shell'send of a coprocess's pipes, once the coprocess has terminated. Whencompiled with MULTIPLE_COPROCS=1, this is true even if there is only asingle coproc:


        $ coproc WC { wc; }
        $ exec {WC[1]}>&-
        [1]+  Done                    coproc WC { wc; }

        # WC var gets cleared!!
        # shell's ${WC[0]} is also closed!

        # now, can't do:

        $ read -u ${WC[0]} X
        $ echo $X

I'm attaching a "bad-coproc-log.txt" with more detailed ps & ls outputexamining the open fds at each step, to make it clear what's happening.

This is a bug. The shell should not automatically close its read pipe toa coprocess that has terminated -- it should stay open to read the finaloutput, and the user should be responsible for closing the read endexplicitly.

This is more obvious for commands that wait until they see EOF beforegenerating any output (wc, sort, sha1sum). But it's also true for anycommand that produces output (filters (sed) or generators (ls)). If theshell's read end is closed automatically, any final output waiting in thepipe will be discarded.

It also invites trouble if the shell variable that holds the fds getsremoved unexpectedly when the coprocess terminates. (Suddenly thevariable expands to an empty string.) It seems to me that the proper timeto clear the coproc variable (if at all) is after the user has explicitlyclosed both of the fds. *Or* else add an option to the coproc keyword toexplicitly close the coproc - which will close both fds and clear thevariable.

...

Separately, I consider the following coproc behavior to be weird, fragile,and broken.

If you fg a coproc, then stop and bg it, it dies. Why? Apparently theshell abandons the coproc when it is stopped, closes the pipe fds for it,and clears the fd variable.


        $ coproc CAT { cat; }
        [1] 10391

        $ fg
        coproc CAT { cat; }

        # oops!

        ^Z
        [1]+  Stopped                 coproc CAT { cat; }

        $ echo ${CAT[@]}  # what happened to the fds?

        $ ls -lgo /proc/$$/fd/
        total 0
        lrwx------ 1 64 Mar 14 02:26 0 -> /dev/pts/3
        lrwx------ 1 64 Mar 14 02:26 1 -> /dev/pts/3
        lrwx------ 1 64 Mar 14 02:25 2 -> /dev/pts/3
        lrwx------ 1 64 Mar 14 02:26 255 -> /dev/pts/3

        $ bg
        [1]+ coproc CAT { cat; } &

        $
        [1]+  Done                    coproc CAT { cat; }

        $ # sad user :(

This behavior is not new to the multi-coproc support. But just the sameit seems broken for the shell to automatically close the fds tocoprocesses. That should be done explicitly by the user.

Word to the wise: you might encounter this issue (coproc2 preventscoproc1 from seeing its end-of-input) even though you are rigging thisup yourself with FIFOs rather than bash's coproc builtin.)
In my case, it's mostly a non-issue, because I fork the - now three -background processes before exec'ing automatic fds redirecting to/fromtheir FIFO's in the parent process. All the automatic fds get put in anarray, and I do close them all at the beginning of a subsequent processsubstitution.

That's a nice trick with the shell backgrounding all the coprocessesbefore connecting the fifos. But yeah, to make subsequent coprocesses youdo still have to close the copy of the user shell's fds that the coprocessshell inherits. It sounds like you are doing that (nice!), but in anycase it requires some care, and as these stack up it is really handy tohave something manage it all for you.

(Perhaps this is where I ask if you are happy with your solution or if youwould like to try out something wildly more flexible...)



Happy coprocessing! :)

Carl

bad-coproc-log.txt
Description: Text document

[Prev in Thread]

Current Thread

[Next in Thread]

Examples of concurrent coproc usage?, Zachary Santer, 2024/03/11
- Re: Examples of concurrent coproc usage?, Carl Edquist <=
  - Re: Examples of concurrent coproc usage?, Zachary Santer, 2024/03/17

Prev by Date: Re: multi-threaded compiling
Next by Date: Re: nameref and referenced variable scope, setting other attributes (was "local -g" declaration references local var in enclosing scope)
Previous by thread: Examples of concurrent coproc usage?
Next by thread: Re: Examples of concurrent coproc usage?
Index(es):
- Date
- Thread