[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-datamash] Is Datamash parallelizable?

From: Assaf Gordon
Subject: Re: [Bug-datamash] Is Datamash parallelizable?
Date: Fri, 08 Aug 2014 07:40:37 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Icedove/24.7.0

Hello Maximilian,

On 08/07/2014 09:09 PM, Maximilian E. Schüle wrote:

thanks for maintaining datamash. For my thesis I want to do some speed
tests with data over different databases. For this purpose I was happy
to find the very interesting tool "datamash", that makes it easier to
compare the processing of data in a database to the processing of data
with normal shell-scripts. For this reason I want to know if Datamash is
parallelizable or does it work on parallel threads. Is it like this?

I'm glad to hear you find "datamash" useful.

Currently, "datamash" does not use multiple threads.
I'm always interested in improvement performance, and if there's a good case 
for multi-threading I'll be glad to try it out.

I'm working on a I/O speed-up improvement (roughly upto x2.5 faster) which will 
be ready on the GNU website soon.
It's available here (including some new operations), if you feel comfortable 
trying non-stable version:
 http://git.savannah.gnu.org/cgit/datamash.git/?h=devel1  ( 'devel1' branch ).

One thing I'd consider trying, if you can split your input files,
is to run multiple 'datamash' instances in parallel, then combine the results.

I'll be happy to discuss further,
 - Assaf.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]