octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bin2dec behavior different from Matlab?


From: Daniel J Sebald
Subject: Re: bin2dec behavior different from Matlab?
Date: Sat, 17 Mar 2012 13:37:10 -0500
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111108 Fedora/3.1.16-1.fc14 Thunderbird/3.1.16

On 03/17/2012 10:14 AM, Jordi GutiƩrrez Hermoso wrote:
On 17 March 2012 06:29, Daniel J Sebald<address@hidden>  wrote:

Well, that leads to the question, Why is cellfun() so noticeably slow? Is it
just in 3.2.4 that cellfun() is slow?  John, any reason that something like

Probably. Since 3.2.4 we've had

     http://hg.savannah.gnu.org/hgweb/octave/rev/d1db86336a49
     http://hg.savannah.gnu.org/hgweb/octave/rev/cf8cd43cdeb3

so that

      
http://undocumentedmatlab.com/blog/cellfun-undocumented-performance-boost/#comment-63587

HTH,

Well, could you please explain what the issue there was and whether the current set up is good optimization? (In other words, give some insight on my questions below.) It looks to me that the change keeps track of the function call info thereby saving a little bit of time, which adds up of course but still leaves room for improvement.

There is a bit too much C++ code for me to digest at the moment, so let me ask some general questions.

1) Is a routine like cellfun() something that can be made efficient, or is it inherently a slow routine, not because of internal programming, but because there are things inside a function that are outside of programming control? For example, I wrote the little routine:

function retval = test_bin2dec (d)
  retval = cellfun('binchar2dec', d);
endfunction

function retval = binchar2dec(s)
  s = s(!isspace(s));
  retval = (s-'0')*(2.^[length(s)-1:-1:0])';
endfunction

but the "length(s)" is simply something that is slow by nature. It would be nice if 2.^[length(s)-1:-1:0] could be precomputed, but it can't. I assume Rik's routine is speedy because of the padding of zeros and then 2.^[length(s)-1:-1:0] (or something similar) just needs computing once and then used in a matrix multiply.

2) Internally, when a cell array is created, is there some flag or descriptor indicating that the array has a consistent class across the whole array? That is "all_same_class" would indicate the cell array is all "char" or all "double", etc. As an example, the cellstr() routine could, rather than set each cell's class, set the "all_the_same_class" variable and copy the contents of the string matrix. Not much extra work, almost the same as a copy.

From what I'm seeing of Rik's code--although appearing the best solution given the situation--it seems to me that Octave users are discouraged slightly in a way from using string cells because of the performance hit of convert to/from character matrix. As it stands, there is this flexible cell now, but it only is good for smaller sets of data. On bigger sets of data, say databases, someone might end up programming around cells with the older, more clumsy methods based upon speed. I would think efficiency of cells is an important detail in promoting clean script writing.

3) There are a number of internal functions that are optimized. Do they work with "scalars" only? Or are there methods that are slightly different for when the input argument is a cell? Going back to question 2, it would be nice if the internal C++ functions could work with cells--especially if they are of "all_same_class"--and chug right through computations without all kinds of redundant class/type checking. Again, I don't know this code real well, just trying to get a feel for things.

4) Now specifically with regard to base2dec, does it seem like this group of functions should have an internal optimized function that all the others call? For example, is base2dec something that could be written in C++? Rik has optimized things by using matrix multiplication. But written internally, the character strings could be of variable length and in some cases save time when there is no padding. (Also, internally if the base is 2 there is probably even further optimization.) And going back to a previous question, if there were a method for working with cell arrays internally that are "all_same_class", there wouldn't be a need to convert cell arrays to a fixed width character matrix.

Dan


reply via email to

[Prev in Thread] Current Thread [Next in Thread]