emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [O] Babel: communicating irregular data to R source-code block


From: Thomas S. Dye
Subject: Re: [O] Babel: communicating irregular data to R source-code block
Date: Mon, 23 Apr 2012 06:46:50 -1000

Hi Eric,

Eric Schulte <address@hidden> writes:

> address@hidden (Thomas S. Dye) writes:
>
>> Aloha Michael,
>>
>> Michael Hannon <address@hidden> writes:
>>
>>> Greetings.  I'm sitting in on a weekly, informal, "brown-bag" seminar on 
>>> data
>>> technologies in statistics.  There are more people attending the seminar 
>>> than
>>> there are weeks in which to give talks, so I may get by with being my usual,
>>> passive-slug self.
>>>
>>> But I thought it might be useful to have a contingency plan and decided that
>>> giving a brief talk about Babel might be useful/instructive.  I thought (and
>>> think) that mushing together (with attribution) some of the content of the
>>> paper [1] by The Gang of Four and the content of Eric's talk [2] might be a
>>> good approach.  (BTW, if this isn't legal, desirable, permissible, etc., 
>>> this
>>> would be a good time to tell me.)
>>>
>
> I would be happy for you to re-use these materials.
>
>>>
>>> I liked the Pascal's Triangle example (which morphed from elisp to Python, 
>>> or
>>> vice versa, in the two references), but I was afraid that the elisp routine
>>> "pst-check", used as a check on the correctness of the previously-generated
>>> Pascal's triangle, might be too esoteric for this audience, not to mention 
>>> me.
>>> (The recursive Fibonacci function is virtually identical in all languages,
>>> but the second part is more obscure.)
>>>
>
> I was giving a presentation to a local lisp/scheme user group, so I
> figured I'd spare them the pain of trying to read python code :).
>
>>>
>>> I thought it should be possible to use R to do the same sanity check, as R
>>> would be much more-familiar to this audience (and its use would still
>>> demonstrate the meta-language feature of Babel).
>>>
>>> Unfortunately, I haven't been able to find a way to communicate the output 
>>> of
>>> the Pascal's Triangle example to an R source-code block.  The gist of the
>>> problem seems to be that regardless of how I try to grab the data (scan,
>>> readLines, etc.) Babel always ends up trying to read a data frame (table) 
>>> and
>>> I get an error similar to:
>>>
>
> I present some options below specific to Tom's discussion, but another
> option may be to use the ":results output" option on a python code block
> which prints the table to STDOUT, and then use something line readLines
> to read from the resulting string into R.
>

I didn't have any luck with :results output, but didn't spend much time
trying to figure it out.

>>>
>>> <<<<<<
>>>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
>>>> : line 1 did not have 5 elements
>>>
>>> Enter a frame number, or 0 to exit   
>>>
>>> 1: read.table("/tmp/babel-3780tje/R-import-3780Akj", header = FALSE, 
>>> row.names
>>> = NULL, sep = "
>>>>>>>>>
>>>
>>> If I construct a table "by hand" with all of the cells occupied, everything
>>> goes OK.  For instance:
>>>
>>> <<<<<<
>>> #+TBLNAME: some-junk
>>> | 1 | 0 | 0 | 0 |
>>> | 1 | 1 | 0 | 0 |
>>> | 1 | 2 | 1 | 0 |
>>> | 1 | 3 | 3 | 1 | 
>>>
>>> #+NAME: read-some-junk(sj_input=some-junk)
>>> #+BEGIN_SRC R
>>>
>>> rowSums(sj_input)
>>>
>>> #+END_SRC  
>>>
>>> #+RESULTS: read-some-junk
>>> | 1 |
>>> | 2 |
>>> | 4 |
>>> | 8 |
>>>>>>>>>
>>>
>>> But the following gives the kind of error I described above:
>>>
>>> <<<<<<
>>> #+name: pascals_triangle
>>> #+begin_src python :var n=5 :exports none :return pascals_triangle(5)
>>> def pascals_triangle(n):
>>>     if n == 0:
>>>         return [[1]]
>>>     prev_triangle = pascals_triangle(n-1)
>>>     prev_row = prev_triangle[n-1]
>>>     this_row = map(sum, zip([0] + prev_row, prev_row + [0]))
>>>     return prev_triangle + [this_row]
>>>
>>> pascals_triangle(n)
>>> #+end_src
>>
>> A few things are wrong at this point.  It seems the JSS article has
>> an error in the header of the pascals_triangle source block.  AFAIK
>> there is no header argument :return.  I don't know how :return
>> pascals_triangle(5) got there, but am fairly certain it shouldn't be.
>>
>
> The :return header argument *is* a supported header argument of python
> code blocks and is not an error.  The python code block should run w/o
> error and without the extra "return pascals_triangle(n)" at the bottom.
> The following works for me.
>
> #+name: pascals_triangle
> #+begin_src python :var n=5 :exports none :return pascals_triangle(5)
> def pascals_triangle(n):
>     if n == 0:
>         return [[1]]
>     prev_triangle = pascals_triangle(n-1)
>     prev_row = prev_triangle[n-1]
>     this_row = map(sum, zip([0] + prev_row, prev_row + [0]))
>     return prev_triangle + [this_row]
>
> #+end_src
>
> #+RESULTS: pascals_triangle
> | 1 |   |    |    |   |   |
> | 1 | 1 |    |    |   |   |
> | 1 | 2 |  1 |    |   |   |
> | 1 | 3 |  3 |  1 |   |   |
> | 1 | 4 |  6 |  4 | 1 |   |
> | 1 | 5 | 10 | 10 | 5 | 1 |
>
> [...]

I'm beginning to see why you have strong feelings about python.  In the
code above, the blank line before #+end_src is necessary and must not
contain any spaces, and :var n can be set to anything, since it is
declared for initialization only.

The code in the JSS article doesn't run for me with a recent Org-mode
unless I add a blank line before #+end_src, or remove the :return header
argument.  If I remove the :return header argument, then the need for
the blank line goes away.  The following code block seems to work:

#+name: pascals-triangle
#+begin_src python :var n=2 :exports none
def pascals_triangle(n):
    if n == 0:
        return [[1]]
    prev_triangle = pascals_triangle(n-1)
    prev_row = prev_triangle[n-1]
    this_row = map(sum, zip([0] + prev_row, prev_row + [0]))
    return prev_triangle + [this_row]
return pascals_triangle(n)
#+end_src

#+RESULTS: pascals-triangle

| 1 |   |   |
| 1 | 1 |   |
| 1 | 2 | 1 |

I'm guessing that the need for a blank line when using :results has
arisen since the JSS article was published, because the article was
generated from source code and didn't show any errors.  

If I have this right (a big if), then might it be possible to
re-establish the old behavior so the JSS code works?  

>>
>> I vaguely remember that it once was possible to pass variables in
>> through the name line, but I couldn't find this syntax in some fairly
>> recent documentation.
>
> This style of passing arguments is still supported, but not necessarily
> encouraged by the documentation.
>

>> It does appear to work still using a recent Org-mode.  If I rename the
>> results and then pass that to the source code block, all is well.
>>
>> #+RESULTS: pascals-tri
>> | 1 |   |    |    |   |   |
>> | 1 | 1 |    |    |   |   |
>> | 1 | 2 |  1 |    |   |   |
>> | 1 | 3 |  3 |  1 |   |   |
>> | 1 | 4 |  6 |  4 | 1 |   |
>> | 1 | 5 | 10 | 10 | 5 | 1 |
>>
>>   
>> #+name: pst-checkR(p=pascals-tri)
>> #+BEGIN_SRC R
>> p
>> #+END_SRC
>>
>> #+RESULTS: pst-checkR
>>
>> | 1 | nil | nil | nil | nil | nil |
>> | 1 |   1 | nil | nil | nil | nil |
>> | 1 |   2 |   1 | nil | nil | nil |
>> | 1 |   3 |   3 |   1 | nil | nil |
>> | 1 |   4 |   6 |   4 | 1   | nil |
>> | 1 |   5 |  10 |  10 | 5   | 1   |
>>
>> This looks like a bug to me, but Eric S. will know better what might be
>> going on.
>
> The above is due to the inability of R (or at least of the read.table
> function) to read in tables with different row length.  The process of
> writing to an Org-mode table and *then* referencing that table as Tom
> suggests above has the side effect of filling in blank spots in the
> final exported table, turning what would otherwise be something like
>
> 1
> 1  1
> 1  2  1
>
> into something like
>
> 1  ""  ""
> 1   1  ""
> 1   2  1
>

Thanks for this explanation.  It makes sense that mapping a python data
structure to an R data structure would involve an intermediate
representation. 

All the best,
Tom

> You could also use a function like the following to explicitly fill in
> these missing lines.
>
> #+name: padded_pascals_triangle
> #+begin_src emacs-lisp :var data=pascals_triangle
>   (let ((max-length (apply #'max (mapcar #'length data))))
>     (mapcar (lambda (row)
>               (append row (make-vector (- max-length (length row)) "") nil))
>             data))
> #+end_src
>
>> I can't do much more than this, but I'm optimistic things will be
>> sorted out before your turn to speak at the seminar rolls around.
>>
>> Thanks for bringing the error in the JSS article to light.
>>
>> All the best,
>> Tom
>>
>
> I often have to explicitly convert data read into R code blocks as a
> table into some other data structure like a vector or a matrix.  I run
> into this myself when trying to use the statistical functions of R.  It
> generally takes a while to look up the function to do the conversion,
> but I imagine that there is a reason why people who know more R than I
> do chose to make tables the default data type for data read into R
> blocks.
>
> Best,
>
> Combining the examples above yields the following,
>
>
> #+name: pascals_triangle
> #+begin_src python :var n=5 :exports none :return pascals_triangle(5) 
> :results vector
> def pascals_triangle(n):
>     if n == 0:
>         return [[1]]
>     prev_triangle = pascals_triangle(n-1)
>     prev_row = prev_triangle[n-1]
>     this_row = map(sum, zip([0] + prev_row, prev_row + [0]))
>     return prev_triangle + [this_row]
>
> #+end_src
>
> #+name: padded_pascals_triangle
> #+begin_src emacs-lisp :var data=pascals_triangle
>   (let ((max-length (apply #'max (mapcar #'length data))))
>     (mapcar (lambda (row)
>               (append row (make-vector (- max-length (length row)) "") nil))
>             data))
> #+end_src
>
> #+begin_src R :var data=padded_pascals_triangle
> data
> #+end_src
>
> #+RESULTS:
> | 1 | nil | nil | nil | nil | nil |
> | 1 |   1 | nil | nil | nil | nil |
> | 1 |   2 |   1 | nil | nil | nil |
> | 1 |   3 |   3 |   1 | nil | nil |
> | 1 |   4 |   6 |   4 | 1   | nil |
> | 1 |   5 |  10 |  10 | 5   | 1   |
>
>
>>
>>>>>>>>>
>>>
>>> Note that I don't really want to do rowSums in this case.  I'm just trying 
>>> to
>>> demonstrate the error.
>>>
>>> Of course, it's clear that the first line does NOT contain five elements, 
>>> nor
>>> does the second, etc., as all of the above-diagonal elements are blanks.
>>>
>>> But I've been unable to find an R input function that doesn't end up 
>>> treating
>>> the source data as a table, i.e., in the context of Babel source blocks -- R
>>> is "happy" to read a lower-diagonal structure.  See the appendix for an
>>> example.
>>>
>>> Any suggestions?  Note that I'm happy to acknowledge that my own ignorance 
>>> of
>>> R and/or Babel might be the source of the problem.  If so, please enlighten
>>> me.
>>>
>>> Thanks.
>>>
>>> -- Mike
>>>
>>> [1] http://www.jstatsoft.org/v46/i03
>>> [2] https://github.com/eschulte/babel-presentation
>>>
>>> <<<<<<
>>> Appendix
>>> --------
>>>
>>>
>>> $ cat pascal.dat
>>> 1
>>> 1 1
>>> 1 2 1
>>> 1 3 3 1
>>> 1 4 6 4 1
>>>
>>> $ R --vanilla < pascal.R
>>>
>>> R version 2.15.0 (2012-03-30)
>>> Copyright (C) 2012 The R Foundation for Statistical Computing
>>> ISBN 3-900051-07-0
>>> Platform: x86_64-redhat-linux-gnu (64-bit)
>>> .
>>> .
>>> .
>>>
>>>> x <- readLines("pascal.dat")
>>>> x
>>> [1] "1"         "1 1"       "1 2 1"     "1 3 3 1"   "1 4 6 4 1"
>>>> str(x)
>>>  chr [1:5] "1" "1 1" "1 2 1" "1 3 3 1" "1 4 6 4 1"
>>>> 
>>>> y <- scan("pascal.dat")
>>> Read 15 items
>>>> y
>>>  [1] 1 1 1 1 2 1 1 3 3 1 1 4 6 4 1
>>>> str(y)
>>>  num [1:15] 1 1 1 1 2 1 1 3 3 1 ...
>>>> 
>>>> z <- read.table("pascal.dat", header=FALSE)
>>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  
>>> : 
>>>   line 1 did not have 5 elements
>>> Calls: read.table -> scan
>>> Execution halted
>>>
>>>

-- 
Thomas S. Dye
http://www.tsdye.com



reply via email to

[Prev in Thread] Current Thread [Next in Thread]