parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Use R to manage results from GNU Parallel


From: David Rosenberg
Subject: Re: Use R to manage results from GNU Parallel
Date: Sun, 5 Jan 2014 08:13:08 -0500


But I would appreciate help with:

  load_parallel_results_split_on_newline(filenametable)
  load_parallel_results_split_to_columns(filenametable)


I'm happy to write these, though I'm limited on time.  Could you could write a generator for test data?  In particular, it'd be good to be able to adjust the size of the files if you're interested in testing the scaling to large files and/or lots of files.

R has limited options for reading data with a non-newline record separator characters. My first approach here would be to pipe the data through  tr or sed to swap the desired record separator character with "\n", so that we can read things into R with the usual commands.  I'm assuming we're on a POSIX system, or something where we can do that.  Otherwise, I think we'd have to read each file as a giant string (as you're doing for 'raw'), and then parse things ourselves, which I'd suspect would be much slower.

BTW, for 'raw', it might be worth comparing the performance of using readLines, followed by collapsing the newlines, to the following approach:

readChar(fileName, file.info(fileName)$size)
(which I got from http://stackoverflow.com/questions/9068397/import-text-file-as-single-character-string )


David
 


load_parallel_results_filenames <- function(resdir) {
  ## Find files called .../stdout
  stdoutnames <- list.files(path=resdir, pattern="stdout", recursive=T);
  ## Find files called .../stderr
  stderrnames <- list.files(path=resdir, pattern="stderr", recursive=T);
  if(length(stdoutnames) == 0) {
    ## Return empty data frame if no files found
    return(data.frame());
  }
  m <- matrix(unlist(strsplit(stdoutnames, "/")),nrow =
length(stdoutnames),byrow=T);
  tbl <- as.table(m[,c(F,T)]);
  ## Append the stdout and stderr filenames
  tbl <- cbind(tbl,unlist(stdoutnames),unlist(stderrnames));
  colnames(tbl) <- c(strsplit(stdoutnames[1],"/")[[1]][c(T,F)],"stderr");
  return(tbl);
}

load_parallel_results_raw_content <- function(tbl) {
  ## Read them
  stdoutcontents <-
    lapply(tbl[,c("stdout")],
           function(x) {
             return(paste(readLines(paste(resdir,x,sep="/")),collapse="\n"))
           } );
  stderrcontents <-
    lapply(tbl[,c("stderr")],
           function(x) {
             return(paste(readLines(paste(resdir,x,sep="/")),collapse="\n"))
           } );
  # Replace filenames with file contents
  tbl[,c("stdout","stderr")] <-
    c(as.character(stdoutcontents),as.character(stderrcontents));
  return(tbl);
}

 

reply via email to

[Prev in Thread] Current Thread [Next in Thread]