parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Use R to manage results from GNU Parallel


From: David Rosenberg
Subject: Re: Use R to manage results from GNU Parallel
Date: Sun, 5 Jan 2014 00:55:47 -0500


 
Your idea requires the user to make sure the output is \t separated.


Yes... I've been doing that for years and life has been better ever since.  But sure, the separator should be a parameter.
  
Maybe we could have an option that would indicate the splitting char.
The default would be none = don't split:

> load_parallel_results(file,split="\t")
    myvar1 myvar2          V1 V2
  1      1      A       Hello  1
  2      1      A         Bye  2
  3      1      A         Wow  3
  4      2      A Interesting  9
  5      1      B     NewYork  3

> load_parallel_results(file)
    myvar1 myvar2          stdout stderr
  1      1      A       "Hello\t1\nBye\t2\nWow\t3\n" ""
  2      2      A "Interesting\t9\n" ""
  3      1      B     "NewYork\t3\n" ""


That seems reasonable.
 
 I am also somewhat concerned that the current function loads all
stdout/stderr files - even if they are never used. It would be better
if that could be done lazily - see
http://stackoverflow.com/questions/20923089/r-store-functions-in-a-data-frame

I'm not sure there's a 'right' answer here.  I think it depends on how you'll use the results. 

I believe I would prefer returning a data-structure, that you could
select the relevant records from based on the arguments. And when you
have the records you want, you can ask to have the stdout/stderr read
in and possibly expanded as rows. This would be able to scale to much
bigger stdout/stderr and many more jobs.

Seems reasonable.  

Maybe the trivial solution is to simply return a table of the args+the
filenames of stdout/stderr, and then have a function that turns that
table into the read in files, which you can run either immediately or
after you have selected the relevant rows.

Yes -- I often do this: first go to the file system to collect all the file paths I might be interested in and the relevant metadata (for me, it's typically creation date).  Then I figure out when paths I want to load, and then load them all in.  

David

/Ole


reply via email to

[Prev in Thread] Current Thread [Next in Thread]