Feature request: halt on threshold

parallel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Feature request: halt on threshold

From:	Ben Rusholme
Subject:	Feature request: halt on threshold
Date:	Fri, 18 Jul 2014 14:22:31 -0700

Hi,

There are currently three options to "—halt" - ignore (0), stop new jobs (1), 
or kill everything (2).

I propose an additional option; to set the number of job failures before doing 
anything. This would then allow some tolerance of failure but would catch 
global problems.

Consider this example - running a 1000 jobs each of around 1hr, where a random 
handful will fail due to unexpected bad data or other unforeseen bug, but the 
overwhelming majority will complete successfully.

Setting —halt 0 all jobs will run, and I can check for the failures afterwards. 
Great! However, say I forget to create the results directory, so every "good" 
job runs for full time then fails right at the end…if I wasn’t monitoring I 
just wasted 1000hrs of processing time.

Setting halt > 0 the job will stop at or just after the first problem. I have 
to check the logs, figure out and fix if possible, rerun with previous success 
excluded etc.

What I would like is to say set the number of tolerable failures to the number 
of workers. Then a serious bug would be caught after the first iteration, but 
the entire job would run and handle some measure of bad input data.

Does this make sense? Unfortunately it would require changing the current 
flags, either adding another or changing the current halt options.

Thanks, Ben

[Prev in Thread]

Current Thread

[Next in Thread]

Feature request: halt on threshold, Ben Rusholme <=
- Re: Feature request: halt on threshold, Ole Tange, 2014/07/18
  - Re: Feature request: halt on threshold, Ben Rusholme, 2014/07/18
  - Re: Feature request: halt on threshold, Ole Tange, 2014/07/19

Prev by Date: Re: What do you use GNU Parallel for?
Next by Date: Re: Feature request: halt on threshold
Previous by thread: What do you use GNU Parallel for?
Next by thread: Re: Feature request: halt on threshold
Index(es):
- Date
- Thread