parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Use R to manage results from GNU Parallel


From: Ole Tange
Subject: Re: Use R to manage results from GNU Parallel
Date: Sun, 5 Jan 2014 12:27:17 +0100

On Sun, Jan 5, 2014 at 6:55 AM, David Rosenberg <david.davidr@gmail.com> wrote:

>> Maybe we could have an option that would indicate the splitting char.
>> The default would be none = don't split:
>>
>> > load_parallel_results(file,split="\t")
>>     myvar1 myvar2          V1 V2
>>   1      1      A       Hello  1
>>   2      1      A         Bye  2
>>   3      1      A         Wow  3
>>   4      2      A Interesting  9
>>   5      1      B     NewYork  3
>>
>> > load_parallel_results(file)
>>     myvar1 myvar2          stdout stderr
>>   1      1      A       "Hello\t1\nBye\t2\nWow\t3\n" ""
>>   2      2      A "Interesting\t9\n" ""
>>   3      1      B     "NewYork\t3\n" ""
>>
>
> That seems reasonable.

Giving it some more thought I think we also want a way to split on
newlines but not on tabs:

> load_parallel_results(file,splitnewline=T)
     myvar1 myvar2          stdout
   1      1      A       "Hello\t1"
   2      1      A       "Bye\t2"
   3      1      A       "Wow\t3"
   4      2      A "Interesting\t9"
   5      1      B     "NewYork\t3"

>> I believe I would prefer returning a data-structure, that you could
>> select the relevant records from based on the arguments. And when you
>> have the records you want, you can ask to have the stdout/stderr read
>> in and possibly expanded as rows. This would be able to scale to much
>> bigger stdout/stderr and many more jobs.

So something like:

load_parallel_results_filenames <- function(resdir) {
  # return
  ##       myvar1 myvar2 stdout      stderr
  ##  [1,] "1"    "A"    "my/dir/1/A/stdout"    "my/dir/1/A/stdout"
}

load_parallel_results_raw_content <- function(filenametable) {
  # return
  ##       myvar1 myvar2 stdout      stderr
  ##  [1,] "1"    "A"    `cat my/dir/1/A/stdout`    `cat my/dir/1/A/stdout`
}

load_parallel_results_split_on_newline <- function(filenametable) {
  # return
  ##       myvar1 myvar2 stdout1
  ##  [1,] "1"    "A"    "stdout-line1"
  ##  [2,] "1"    "A"    "stdout-line2"
}

load_parallel_results_split_to_columns <- function(filenametable) {
  # return
  ##       myvar1 myvar2 stdout1 stdout2
  ##  [1,] "1"    "A"    "col1-line1"    "col2-line1"
  ##  [2,] "1"    "A"    "col1-line2"    "col2-line2"
}

Maybe it makes sense that all these functions can be called from a
single function by setting options:

load_parallel_results(x,output=NULL,linesep="\n",colsep="\t") {
  if(x is string) {
    resdir <- x
    filenametable <- load_parallel_results_filenames(resdir);
  }
  if(x is table) {
    filenametable <- x
  }
  if(output==raw) {
    return(load_parallel_results_raw_content(filenametable))
  }
  if(output==newline) {
    return(load_parallel_results_split_on_newline(filenametable,linesep))
  }
  if(output==columns) {
    
return(load_parallel_results_split_to_columns(filenametable,linesep,colsep));
  }
  return(load_parallel_results_filenames(resdir))
}

I have made (see below):

  load_parallel_results_raw_content(filenametable)
  load_parallel_results_filenames(resdir)

But I would appreciate help with:

  load_parallel_results_split_on_newline(filenametable)
  load_parallel_results_split_to_columns(filenametable)

/Ole


load_parallel_results_filenames <- function(resdir) {
  ## Find files called .../stdout
  stdoutnames <- list.files(path=resdir, pattern="stdout", recursive=T);
  ## Find files called .../stderr
  stderrnames <- list.files(path=resdir, pattern="stderr", recursive=T);
  if(length(stdoutnames) == 0) {
    ## Return empty data frame if no files found
    return(data.frame());
  }
  m <- matrix(unlist(strsplit(stdoutnames, "/")),nrow =
length(stdoutnames),byrow=T);
  tbl <- as.table(m[,c(F,T)]);
  ## Append the stdout and stderr filenames
  tbl <- cbind(tbl,unlist(stdoutnames),unlist(stderrnames));
  colnames(tbl) <- c(strsplit(stdoutnames[1],"/")[[1]][c(T,F)],"stderr");
  return(tbl);
}

load_parallel_results_raw_content <- function(tbl) {
  ## Read them
  stdoutcontents <-
    lapply(tbl[,c("stdout")],
           function(x) {
             return(paste(readLines(paste(resdir,x,sep="/")),collapse="\n"))
           } );
  stderrcontents <-
    lapply(tbl[,c("stderr")],
           function(x) {
             return(paste(readLines(paste(resdir,x,sep="/")),collapse="\n"))
           } );
  # Replace filenames with file contents
  tbl[,c("stdout","stderr")] <-
    c(as.character(stdoutcontents),as.character(stderrcontents));
  return(tbl);
}



reply via email to

[Prev in Thread] Current Thread [Next in Thread]