[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Use R to manage results from GNU Parallel
From: |
Ole Tange |
Subject: |
Re: Use R to manage results from GNU Parallel |
Date: |
Sun, 5 Jan 2014 12:27:17 +0100 |
On Sun, Jan 5, 2014 at 6:55 AM, David Rosenberg <david.davidr@gmail.com> wrote:
>> Maybe we could have an option that would indicate the splitting char.
>> The default would be none = don't split:
>>
>> > load_parallel_results(file,split="\t")
>> myvar1 myvar2 V1 V2
>> 1 1 A Hello 1
>> 2 1 A Bye 2
>> 3 1 A Wow 3
>> 4 2 A Interesting 9
>> 5 1 B NewYork 3
>>
>> > load_parallel_results(file)
>> myvar1 myvar2 stdout stderr
>> 1 1 A "Hello\t1\nBye\t2\nWow\t3\n" ""
>> 2 2 A "Interesting\t9\n" ""
>> 3 1 B "NewYork\t3\n" ""
>>
>
> That seems reasonable.
Giving it some more thought I think we also want a way to split on
newlines but not on tabs:
> load_parallel_results(file,splitnewline=T)
myvar1 myvar2 stdout
1 1 A "Hello\t1"
2 1 A "Bye\t2"
3 1 A "Wow\t3"
4 2 A "Interesting\t9"
5 1 B "NewYork\t3"
>> I believe I would prefer returning a data-structure, that you could
>> select the relevant records from based on the arguments. And when you
>> have the records you want, you can ask to have the stdout/stderr read
>> in and possibly expanded as rows. This would be able to scale to much
>> bigger stdout/stderr and many more jobs.
So something like:
load_parallel_results_filenames <- function(resdir) {
# return
## myvar1 myvar2 stdout stderr
## [1,] "1" "A" "my/dir/1/A/stdout" "my/dir/1/A/stdout"
}
load_parallel_results_raw_content <- function(filenametable) {
# return
## myvar1 myvar2 stdout stderr
## [1,] "1" "A" `cat my/dir/1/A/stdout` `cat my/dir/1/A/stdout`
}
load_parallel_results_split_on_newline <- function(filenametable) {
# return
## myvar1 myvar2 stdout1
## [1,] "1" "A" "stdout-line1"
## [2,] "1" "A" "stdout-line2"
}
load_parallel_results_split_to_columns <- function(filenametable) {
# return
## myvar1 myvar2 stdout1 stdout2
## [1,] "1" "A" "col1-line1" "col2-line1"
## [2,] "1" "A" "col1-line2" "col2-line2"
}
Maybe it makes sense that all these functions can be called from a
single function by setting options:
load_parallel_results(x,output=NULL,linesep="\n",colsep="\t") {
if(x is string) {
resdir <- x
filenametable <- load_parallel_results_filenames(resdir);
}
if(x is table) {
filenametable <- x
}
if(output==raw) {
return(load_parallel_results_raw_content(filenametable))
}
if(output==newline) {
return(load_parallel_results_split_on_newline(filenametable,linesep))
}
if(output==columns) {
return(load_parallel_results_split_to_columns(filenametable,linesep,colsep));
}
return(load_parallel_results_filenames(resdir))
}
I have made (see below):
load_parallel_results_raw_content(filenametable)
load_parallel_results_filenames(resdir)
But I would appreciate help with:
load_parallel_results_split_on_newline(filenametable)
load_parallel_results_split_to_columns(filenametable)
/Ole
load_parallel_results_filenames <- function(resdir) {
## Find files called .../stdout
stdoutnames <- list.files(path=resdir, pattern="stdout", recursive=T);
## Find files called .../stderr
stderrnames <- list.files(path=resdir, pattern="stderr", recursive=T);
if(length(stdoutnames) == 0) {
## Return empty data frame if no files found
return(data.frame());
}
m <- matrix(unlist(strsplit(stdoutnames, "/")),nrow =
length(stdoutnames),byrow=T);
tbl <- as.table(m[,c(F,T)]);
## Append the stdout and stderr filenames
tbl <- cbind(tbl,unlist(stdoutnames),unlist(stderrnames));
colnames(tbl) <- c(strsplit(stdoutnames[1],"/")[[1]][c(T,F)],"stderr");
return(tbl);
}
load_parallel_results_raw_content <- function(tbl) {
## Read them
stdoutcontents <-
lapply(tbl[,c("stdout")],
function(x) {
return(paste(readLines(paste(resdir,x,sep="/")),collapse="\n"))
} );
stderrcontents <-
lapply(tbl[,c("stderr")],
function(x) {
return(paste(readLines(paste(resdir,x,sep="/")),collapse="\n"))
} );
# Replace filenames with file contents
tbl[,c("stdout","stderr")] <-
c(as.character(stdoutcontents),as.character(stderrcontents));
return(tbl);
}
- Use R to manage results from GNU Parallel, Ole Tange, 2014/01/04
- Message not available
- Re: Use R to manage results from GNU Parallel, Ole Tange, 2014/01/04
- Re: Use R to manage results from GNU Parallel, David Rosenberg, 2014/01/05
- Re: Use R to manage results from GNU Parallel,
Ole Tange <=
- Re: Use R to manage results from GNU Parallel, David Rosenberg, 2014/01/05
- Re: Use R to manage results from GNU Parallel, Ole Tange, 2014/01/05
- Re: Use R to manage results from GNU Parallel, David Rosenberg, 2014/01/05
- Re: Use R to manage results from GNU Parallel, Ole Tange, 2014/01/05
- Re: Use R to manage results from GNU Parallel, David Rosenberg, 2014/01/05
- Re: Use R to manage results from GNU Parallel, Ole Tange, 2014/01/05
- Re: Use R to manage results from GNU Parallel, David Rosenberg, 2014/01/06