Re: Determining if samples are normal

help-octave

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Determining if samples are normal

From:	Mike Miller
Subject:	Re: Determining if samples are normal
Date:	Mon, 26 Sep 2005 15:54:13 -0500 (CDT)

On Mon, 26 Sep 2005, Przemek Klosowski wrote:

I said that I tried it for 10000 as well---I didn't give it as anexample because it almost blew my computer out of the water---bothoctave and the 'less' pager used almost a gigabyte of memory because thefunction as written printed out intermediate results.

Przemek is allowing me to send his note (above) to the list. He is quiteright and I apologize for responding only to his code for 100. If thesame result is obtained with 10000 as with 100, the test seems ratherlame.

Now that I've had a chance to look at it again, the problem is that theoriginal presentation of the test is not correct. What is needed arecritical values for the correlation coefficient based on truly normal dataat various sample sizes. These have been published. For example, seeJohnson and Wichern (1992) Applied Multivariate Statistics, p. 158, Table4.2.


This is for samples of 100 from the triangular distribution:

normals=normal_inv(([1:100]'-.5)/100,0,1);
r=zeros(10000,1);
for i=1:10000, r(i)=corrcoef([normals,sort(rand(100,2)*[1;1])])(2,1); end
mean(r < .9895)

ans = 0.13340

mean(r < .9873)

ans = 0.049400

mean(r < .9822)

ans = 0.0033000

The values .9895, .9873 and .9822 are the critical points for .10, .05 and.01 significance levels. The test has essentially no power here. Fewerthan 5% exceeded the 5% alpha threshold. But with samples of 300, westart to see differences:

normals=normal_inv(([1:300]'-.5)/300,0,1);
for i=1:10000, r(i)=corrcoef([normals,sort(rand(300,2)*[1;1])])(2,1); end
mean(r < .9960)

ans = 0.68390
octave:39> mean(r < .9953)
ans = 0.48890
octave:40> mean(r < .9935)
ans = 0.12140

So 49% are statistically significant at the .05 level.

Obviously, with 10,000 observations we would have much greater power than49% (and 49% isn't all that impressive, but a normal curve and a trianglearen't all that different!). Unfortunately, I don't know what the propercritical points are for the distribution of the correlation with N of10,000, so I'm not going to do the test.

I don't understand the Doug Stewart approach to this. I'm sure my way(implied in the above code) is much quicker and easier.


Mike

--
Michael B. Miller, Ph.D.
Assistant Professor
Division of Epidemiology and Community Health
and Institute of Human Genetics
University of Minnesota
http://taxa.epi.umn.edu/~mbmiller/



-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web:  http://www.octave.org
How to fund new projects:  http://www.octave.org/funding.html
Subscription information:  http://www.octave.org/archive.html
-------------------------------------------------------------

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Determining if samples are normal, (continued)
- Re: Determining if samples are normal, Joe Koski, 2005/09/25
  - Re: Determining if samples are normal, Søren Hauberg, 2005/09/25
    - Re: Determining if samples are normal, Robert A. Macy, 2005/09/25
  - Re: Determining if samples are normal, Michael Creel, 2005/09/26
- Re: Determining if samples are normal, Paul Kienzle, 2005/09/26
  - Re: Determining if samples are normal, Søren Hauberg, 2005/09/26
  - Re: Determining if samples are normal, Henry F. Mollet, 2005/09/26
    - Re: Determining if samples are normal, Paul Kienzle, 2005/09/26
    - Re: Determining if samples are normal, Mike Miller, 2005/09/26
    - Re: Determining if samples are normal, Paul Kienzle, 2005/09/26

Prev by Date: Re: speed of octave interpreter
Next by Date: Re: speed of octave interpreter
Previous by thread: Re: Determining if samples are normal
Next by thread: Re: Determining if samples are normal
Index(es):
- Date
- Thread