[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Splitting STDIN to parallel processes (map-reduce on blocks of data)
From: |
Ole Tange |
Subject: |
Re: Splitting STDIN to parallel processes (map-reduce on blocks of data) |
Date: |
Wed, 19 Jan 2011 01:25:09 +0100 |
On Tue, Jan 11, 2011 at 4:32 PM, Ole Tange <tange@gnu.org> wrote:
> You are hereby invited to help design a block-wise-map-reduce feature
> of GNU Parallel. These are my current thoughts. Feel free to give your
> input - especially if you need something similar.
An alpha test version is now available:
http://alpha.gnu.org/gnu/parallel/parallel-20110119.tar.bz2
It contains:
* --joblog which is documented.
And these that are not documented:
* --spreadstdin which will spread the stdin to jobslots.
* --recstart which is the regular expression of the start of a record
* --recend which is the regular expression of the end of a record
GNU Parallel will read 1 MB from STDIN and find the last recstart or
recend. It will then remove the last partial record and pass the rest
to a jobslot on STDIN. The partial record will be prepended to next 1
MB chunk.
Example:
seq 1 1000000 | parallel --recend "\n" -j10 --spreadstdin grep '31337$'
It has not been tested thoroughly so there are bound to be bugs in it.
Please create reproducable errors and report them.
https://savannah.gnu.org/bugs/?func=additem&group=parallel
If you have a better name for --spreadstdin let us hear it.
If you can come up with good examples of use that would be nice.
/Ole