parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Splitting STDIN to parallel processes (map-reduce on blocks of data)


From: Ole Tange
Subject: Re: Splitting STDIN to parallel processes (map-reduce on blocks of data)
Date: Wed, 19 Jan 2011 01:25:09 +0100

On Tue, Jan 11, 2011 at 4:32 PM, Ole Tange <tange@gnu.org> wrote:
> You are hereby invited to help design a block-wise-map-reduce feature
> of GNU Parallel. These are my current thoughts. Feel free to give your
> input - especially if you need something similar.

An alpha test version is now available:
http://alpha.gnu.org/gnu/parallel/parallel-20110119.tar.bz2

It contains:

* --joblog which is documented.

And these that are not documented:

* --spreadstdin which will spread the stdin to jobslots.
* --recstart which is the regular expression of the start of a record
* --recend which is the regular expression of the end of a record

GNU Parallel will read 1 MB from STDIN and find the last recstart or
recend. It will then remove the last partial record and pass the rest
to a jobslot on STDIN. The partial record will be prepended to next 1
MB chunk.

Example:

seq 1 1000000 | parallel --recend "\n" -j10 --spreadstdin grep '31337$'

It has not been tested thoroughly so there are bound to be bugs in it.
Please create reproducable errors and report them.
https://savannah.gnu.org/bugs/?func=additem&group=parallel

If you have a better name for --spreadstdin let us hear it.

If you can come up with good examples of use that would be nice.


/Ole



reply via email to

[Prev in Thread] Current Thread [Next in Thread]