parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

How to use Bash pipefail with GNU Parallel


From: David Ventimiglia
Subject: How to use Bash pipefail with GNU Parallel
Date: Mon, 23 Nov 2015 14:25:03 -0800

Hello,

How do I use Bash pipefail with GNU Parallel?  

Let me be more specific.  Here's what I'm doing, in detail.
  1. Storing commands to be executed in a SQL database table.
  2. Selecting those commands out of the database, using a command-line tool.
  3. Piping those commands into GNU Parallel.
  4. The stored commands themselves comprise several commands joined into a pipeline.
  5. Unsurprisingly, the first command in those pipelines is curl.
  6. The curl command can easily fail (e.g., gateway timeout, whatever).
  7. I want that failure to cause the whole pipeline to fail.
  8. I want any failed pipeline command to cause GNU parallel to fail (i.e., return non-zero exit status).
Here's an example of a command stored in the database table.

curl -sf <some URL> | awk <some program>

Here's how I select that out of the database and feed it into GNU parallel.  Note that I'm using the curl option '-f' to urge it to fail (return a non-zero exit status) for various error-related HTTP status codes.  

sql -n <some database URL> 'select command from mycommands' | parallel --halt 2 > output.csv

Note that this uses GNU sql, which on my machine is packaged with GNU Parallel.  Note also that I'm using the GNU parallel option '--halt 2' so that it fails if any of the commands fed to it fail.  The problem is that the individual commands never fail (i.e., they don't return a non-zero exit status) because the last command in their pipelines--the call to awk--never fails.  In other words, any error reporting from the curl command at the start of the pipeline gets lost.

Evidently, Bash has a convenient facility for this.  One can set the 'pipefail' option so that the exit status of pipelines is right-most non-zero exit status of its constituent commands, or zero if none of its commands fails.  That's exactly what I want.  The question is this.

How do I pass 'set -o pipefail' to the commands that are run by GNU parallel?

I think the answer lies somewhere in the labyrinth of combinations of using some GNU Parallel options (-q --shell-quote -I{}) and some Bash options (-c).

Here's what I tried so far, and which didn't work.

sql -n <some database URL> 'select command from mycommands' | parallel --halt 2 bash -c 'set -o pipefail; {}'

That didn't work, because evidently bash didn't believe that it was getting a command passed to its -c flag.  In fact, it's as if the substitution variable {} isn't working at all.  

What did work, was to bake the Bash flag 'set -o pipefail;' right into the command that's stored in the database table.  That works, but I'd rather keep that detail out of the command database and bake it right into the machinery that fishes the command out of the database and runs it using GNU parallel.  It's just, I'm at a loss with respect to GNU parallel's command-line building and in how to accomplish this.  

Thanks,
Best,
David

reply via email to

[Prev in Thread] Current Thread [Next in Thread]