parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Use R to manage results from GNU Parallel


From: David Rosenberg
Subject: Re: Use R to manage results from GNU Parallel
Date: Sun, 5 Jan 2014 12:02:46 -0500


The load_parallel_results_split_on_newline  you sent didn't seem to work for me... In any case, here's my first approach.  I'm returning a matrix instead of a data.table, since everything's the same type.  


load_parallel_results_split_on_newline <- function(filenametable) {
  raw <- load_parallel_results_raw(filenametable);
  varnames = setdiff(colnames(raw), c("stdout","stderr"))
  header_cols = which(colnames(raw) %in% varnames)
  splits = strsplit(raw[,"stdout"], "\n")
  lens = sapply(splits, length)
  reps = rep(1:nrow(raw), lens) ## 
  m = cbind(raw[reps, header_cols], unlist(splits))
  return(m)
}


>>   load_parallel_results_split_to_columns(filenametable)
>
> I'm happy to write these, though I'm limited on time.  Could you could write
> a generator for test data?

parallel --results my/results/dir --header : echo FOO={foo}
BAR={bar}';'seq {bar} :::: <(echo foo; seq 1000) <(echo bar; seq 10)

Can we do with multiple columns? 
 
I do not like the idea of shelling out simply to read a file. If we
are talking tons of small files then spawning a shell will slow it
down tremendously.

Very good point.  I think the fastest would be to do all the data processing in a single shell with an (automatically generated) awk script (run from parallel) that outputted data in such a way that we would have the results we want with just a single read.table. But let's see what we can do with R alone.

I read that anything you can do on a connection (i.e. R's filehandle)
you can also do on a string using textConnection. So I would suggest
we make an efficient raw reader and use that and then use
a=sub(newlinesep,"\n",a) to replace newline/tab and finally use R's
builtin reader on a textConnection.

Sounds good.

Does every directory generated by parallel always has both stdout and stderr?


David

reply via email to

[Prev in Thread] Current Thread [Next in Thread]