help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Determining if samples are normal


From: Mike Miller
Subject: Re: Determining if samples are normal
Date: Mon, 26 Sep 2005 15:54:13 -0500 (CDT)

On Mon, 26 Sep 2005, Przemek Klosowski wrote:

I said that I tried it for 10000 as well---I didn't give it as an example because it almost blew my computer out of the water---both octave and the 'less' pager used almost a gigabyte of memory because the function as written printed out intermediate results.


Przemek is allowing me to send his note (above) to the list. He is quite right and I apologize for responding only to his code for 100. If the same result is obtained with 10000 as with 100, the test seems rather lame.

Now that I've had a chance to look at it again, the problem is that the original presentation of the test is not correct. What is needed are critical values for the correlation coefficient based on truly normal data at various sample sizes. These have been published. For example, see Johnson and Wichern (1992) Applied Multivariate Statistics, p. 158, Table 4.2.

This is for samples of 100 from the triangular distribution:

normals=normal_inv(([1:100]'-.5)/100,0,1);
r=zeros(10000,1);
for i=1:10000, r(i)=corrcoef([normals,sort(rand(100,2)*[1;1])])(2,1); end
mean(r < .9895)
ans = 0.13340
mean(r < .9873)
ans = 0.049400
mean(r < .9822)
ans = 0.0033000

The values .9895, .9873 and .9822 are the critical points for .10, .05 and .01 significance levels. The test has essentially no power here. Fewer than 5% exceeded the 5% alpha threshold. But with samples of 300, we start to see differences:


normals=normal_inv(([1:300]'-.5)/300,0,1);
for i=1:10000, r(i)=corrcoef([normals,sort(rand(300,2)*[1;1])])(2,1); end
mean(r < .9960)
ans = 0.68390
octave:39> mean(r < .9953)
ans = 0.48890
octave:40> mean(r < .9935)
ans = 0.12140

So 49% are statistically significant at the .05 level.

Obviously, with 10,000 observations we would have much greater power than 49% (and 49% isn't all that impressive, but a normal curve and a triangle aren't all that different!). Unfortunately, I don't know what the proper critical points are for the distribution of the correlation with N of 10,000, so I'm not going to do the test.

I don't understand the Doug Stewart approach to this. I'm sure my way (implied in the above code) is much quicker and easier.

Mike

--
Michael B. Miller, Ph.D.
Assistant Professor
Division of Epidemiology and Community Health
and Institute of Human Genetics
University of Minnesota
http://taxa.epi.umn.edu/~mbmiller/



-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web:  http://www.octave.org
How to fund new projects:  http://www.octave.org/funding.html
Subscription information:  http://www.octave.org/archive.html
-------------------------------------------------------------



reply via email to

[Prev in Thread] Current Thread [Next in Thread]