pspp-dev
[Top][All Lists]

## GLM/ANOVA

 From: Ed Subject: GLM/ANOVA Date: Thu, 5 Jun 2008 16:58:06 +0100

```Hi all

I wanted to give a brief summary of my thoughts so far on how to do
GLM/ANOVA in pspp.

The linear regression code (math/linreg/* ) already handles multiple
regression, i.e. it solves

y = Xb + e

for the vector b of regression coefficients, given y a vector of
observations, X a matrix of independent variables and e an error term.
In particular it handles the case where X is not of full rank, which
is one of the distinguishing characteristics of GLM.

The GLM problem is just YM = XB + E (all matrices), where the new term
M is a linear transform of the independent variables. This could be
handled by the existing linear regression code if vectors are changed
to matrices (could select between vector and matrix where applicable
if there's a big efficiency gain).

The solution is just the multivariate multiple regression solution
postmultiplied by the given M, which could happen elsewhere (and
probably should to allow efficient testing of different contrasts).

Now, this part is pretty trivial, especially with almost all the code
in place already (method of solution is identical).

The bit that's quite a lot harder than I'd realised is extracting all
the [M]AN[C]OVA bits from this; I've never really seen this in full
generality. The algorithm is just to take your fitted regression means
B and compare sums of squared error from various parts of the model.
There are at least 6 common ways of doing this (type i through vi sums
of squares). Random factors in this method are handled by linear
combinations of dependent variables Y in the M matrix (although I'm
obvious to me that this yields the same random factors calculations as
via other methods?).

Anyway my feeling is that the first step is to build design matrices
based on a given ANOVA/model spec. The second step is to implement the
various kinds of sums of squares. The final step is to provide the
UNIANOVA command/other glm interfaces for users in PSPP (and
presumably the GLM/GLM repeated measures/etc dialogs too).

If this seems reasonable, it would be really useful if anyone has a
good references on (pointers will do, I have access to a decent
library):

* the exact method of calculation of the various kinds of sums of
squares (especially type iv - it seems to be an unsafe method that
enjoyed brief popularity, but is now warned about but not described

* a decent discussion of fitting random factors in a GLM; is the
summing to 0, and having magnitude 1) the correct one to use? And how
do we estimate degrees of freedom here (I read about the details of
this once, but it was only of academic interest at the time and I've
quickly forgotten!)?

* a bit prosaic, but the command format isn't in our library (we have
no non-gui SPSS references). I pulled the SPSS 15 Base User's Guide
from somewhere on the internet, but the description of the command
format is terse to say the least, and it doesn't cover all the
commands (for example I read somewhere that there are legacy SPSS
MANOVA etc commands still available, but no longer documented).

Ed

```