help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: package nan warnings


From: Alois Schloegl
Subject: Re: package nan warnings
Date: Sat, 04 Aug 2012 02:10:01 +0200
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.16) Gecko/20120613 Icedove/3.0.11

On 2012-08-03 19:21, Max Brister wrote:
On Thu, Aug 2, 2012 at 4:27 PM, Alois Schloegl<address@hidden>  wrote:
On 2012-08-02 22:43, Jordi Gutiérrez Hermoso wrote:

On 2 August 2012 16:40, Alois Schloegl<address@hidden>   wrote:

3) after installing the NaN-toolbox,  sum([1 NaN 2]) will still result in
NaN. But with the NaN-toolbox you have an additional function
sumskipnan([1,NaN,2]) which gives 3.


Why don't you name all of your functions this way and not shadow core
functions, then? For example, why do you overwrite sumsq?

- Jordi G. H.



Ok, sumsq() is a borderline case because you might argue that is not
necessarily a statistical function.

But for the other functions, why should one need to thing about whether to
use var() or nanvar(), mean() or nanmean(), std() or nanstd() ? There is no
need for the NaN-propagating version, you always should use the nan-skipping
version.

This is not always true. For example, lets say I want to write a
quick, simple test to see if rand is working. I might write something
like

assert (mean (rand (10000)(:)), .5, .1); # the mean value of rand
should be around .5

I expect this case to fail if rand produces a NaN.


Hi Max,


thanks for your interest and your attempt to find a solution.

rand() does never produce NaN, so it's not a good example. But lets assume there is some myrand()- functions, and it can produce NaN, I'd expect that NaN is an encoding for missing values. In that case, mean() should ignore the NaN's.

If you need to test for NaN's, do it in an explicit way using any(isnan(x(:))). That's much cleaner, and others will know that your code is testing for NaN's. The problem with implicit NaN-propagation is that it is very difficult to know, whether the NaN-handling has been is a conscious decision or is just a arbitrary side-effect.



When one tries to solve a challenging problem, why should one need to thing
about whether to use var(), nanvar(), or some_other_varfunction() ? There is
just no need such proliferation of function names - all doing basically the
same.

As far as the user is concerned, I agree with you. If a user installs
the NaN package when they 'var' they want the nan skipping version. I
do not think we should be spitting out a bunch of warnings as what the
user wants is unambiguous.

On the other hand, this creates an issue for scripts in core. Your
functions are doing basically, but not quite the same thing. When
writing scripts in core I expect NaNs to be propagated. It leads to a
maintenance nightmare if you can not be sure of exactly how a function
behaves (see gnulib/autotools).


The functions in core and the NaN-tb are doing the same, except for the NaN-propagation thing. Even the core function do not mention in the documentation that NaN's are propagated (see help mean, help var). So, the NaN-handling is really not strictly defined. Applications that rely on NaN-propagation depended on some undocumented behaviour. If you need to test for NaN's, one should do it in an explicit way, e.g. using any(isnan(x(:))). That avoids any ambiguity about NaN handling in your code.



Concerning you suggestion "to partition the namespaces (classes)". To me
this sounds like 2nd class citizens. But perhaps it's just me, and being not
familiar with this technique. In that case, it would be best if someone else
would transform the NaN-tb into a more compatible mode. I'm open for
suggestions.

A more practical solution would be to use a package [1]. The main
problem here is that Octave does not support packages (yet). What do
you think about having NaN inside of a package?

[1] http://www.mathworks.com/help/techdoc/matlab_oop/brfynt_-1.html


I do not know - the concept of "package" must be quite new, and I've never used it. It seems to me that it is another way to move the issue to some other namespace/class/packages.

These "solutions" have one thing in common, they are just a bad compromise, to sidestep the really address - namely what kind of NaN-handling should be the default for statistical functions.

However, if you believe that there is some need for a compromise solution, a solution based on packages might be a good idea. In that case, just do it.



Alois

Max Brister



reply via email to

[Prev in Thread] Current Thread [Next in Thread]