parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sqlmaster "nowait" and "append" functionality?


From: Ole Tange
Subject: Re: sqlmaster "nowait" and "append" functionality?
Date: Tue, 6 Dec 2016 23:53:57 +0100

On Tue, Dec 6, 2016 at 6:10 PM, Andy Loftus <aloftus@gmail.com> wrote:

> Currently, sqlmaster appears to populate a table, first dropping any
> existing table, and waits for jobs to complete.  Wondering if there is a way
> to:
>
> 1. populate the database then exit immediately without waiting?
> My particular use case is parallelizing backup tasks that are expected to
> run a long time (several hours on average).

Hmmm... maybe we should change, so you need to add '--wait' if you
want '--sqlmaster' to wait. Seems like a reasonable change.

> On a related note, why does the sqlworker command require the exact same
> input as the sqlmaster?  Shouldn't it be sufficient that all necessary
> information is stored in the database and then the sqlworker can just pull
> tasks from the database?

--sqlmaster only inserts the values - not the command. The problem
comes with replacement strings like {%}. You will never know in
advance which job slot a job will be run as:

    parallel echo {%} ::: {a..j}

So --sqlmaster cannot store the actual command to run.

It _could_ be changed so that --sqlmaster stores the template command
into the command column, and --sqlworker fetches it form here, and
replaces the command column with the actual command run when done.

It would, however, be a fairly big change of GNU Parallel: The
assumption has always been that the template command remains the same
for the whole run, and quite a bit of optimization depends on this.

On a similar note: What would you expect the table should look like
when you run:

# These do not work - but what would you expect them to do to the table?
parallel --sqlandworker $DBURL -X echo {%}: {} ::: {1..10}
parallel --sqlandworker $DBURL -N3 echo {%}: {} ::: {1..10}
parallel --sqlandworker $DBURL echo {%} '{= $_=total_jobs() =}' ::: {1..10}

> 2. append new tasks to an existing database?
> I think this is more likely a feature request since the man page
> specifically says table will be clobbered. As I understand, the reason is
> that the table schema must/should match, especially the V* columns. But,
> really, isn't that ultimately a burden for the user (as opposed to the
> developer)?

I struggled with this decision, too. My reasoning was, that if you run:

    parallel --sqlandmaster $DBURL echo ::: {1..3}

but really meant:

    parallel --sqlandmaster $DBURL echo ::: {1..3} ::: {4..6}

then you would have to find a way to clean the database first.

> Perhaps a specific flag allowing "append" operation so user can
> be duly warned and could still check that the number of V* columns matches.

Like having the DBURL start with '+': +pg://tange:mypass/tange/TBL8007

I believe this part is relatively simple to do - especially if we
allow it to die horribly if the columns do not match.

It should:

* Not drop table
* Find the max seq-number, and continue from there

> A sample use case is executing a bunch of generated bash scripts (per above,
> this is how the parallel backups are handled), so the V1 column is the
> absolute path to the script and the command is simply "bash".

I can definitely see it being useful, and I believe the easy changes
would accommodate this:

# Append backup-script* to the queue in $DBURL. Exit immediately (no --wait).
parallel --sqlmaster +$DBURL bash ::: backup-script*
# or even:
chmod 755 backup-script*
parallel --sqlmaster +$DBURL ::: backup-script*

# Do the work by grapping arguments from $DBURL
parallel --sqlworker $DBURL bash
# or even:
PATH=.:$PATH
parallel --sqlworker $DBURL


/Ole



reply via email to

[Prev in Thread] Current Thread [Next in Thread]