[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: interactions
From: |
Jason Stover |
Subject: |
Re: interactions |
Date: |
Mon, 16 Apr 2007 13:53:54 -0400 |
User-agent: |
Mutt/1.5.10i |
On Mon, Apr 16, 2007 at 11:10:43AM +0800, John Darrington wrote:
> If we were to follow approach 2, am I right in thinking that the
> 'interaction' data structure could be as large as the number of
> cases in the casefile?
No. It would have either a hash of possible values (all unique), or a
small function to get back and forth between a union value and a
binary vector.
> On the other hand, approach 1 sounds attractive, but there are things
> that need to be considered:
>
> a) They'd have to be a special class of variable, which would not
> normally be displayed, written to system files etc. So a new
> enum dict_class entry in variable.h would be required.
>
> b) I'm not sure how existing code would deal with these
> 'invisible' variables. For example many procedures might iterate
> through all the variables. So dict_get_var_cnt might have to
> take a parameter so that we'd know if we were interested in
> 'interaction' variables or not.
These statements make me think approach 2 is the way, especially your comment b)
above.
> c) Presumably it's not just the dictionary that needs modifying.
> When you add new interaction, you also need to add values for the
> variables into the casefile? That involves running a procedure.
> What I did for RANK was to create a temporary variable, which was an
> illegal name in pspp syntax, and delete it afterwards.
No, no extra data need to be written to the casefile. But given a) and b)
above, I think approach number 2 would be the least painful.
-Jason
>
> On Sun, Apr 15, 2007 at 03:06:17PM -0400, Jason Stover wrote:
> To have a glm procedure, pspp needs a data structure to handle
> interactions. An interaction can be thought of as another variable
> which is a function of two or more variables, usually categorical,
> like this:
>
> Variable 1 Variable 2 Interaction
> A B AB
> E B EB
> A C AC
> E C EC
>
>
> ...etc. The interaction term could be created in one of two ways:
> Either 1) create a new variable in the dictionary that corresponds to
> the interaction, or 2) create a new 'interaction' data structure
> that contains all necessary mappings between existing variables and
> the value of the interaction.
>
> Approach 1 would add a variable to the dictionary, but would not
> create any more observations in the data set. It would make coding any
> procedures that use interactions easier than approach 2, because doing
> so would mean the procedure doesn't need to know about much special
> code to handle interactions. It would also prevent the need for having
> any more obscure string-values-to-binary-vector code like that in
> category.[ch]. Approach 1 would still require the creation of some
> code to create the interaction, though it may not require the creation
> of a specialized "interaction" data structure to be available for use
> by all procedures.
>
> Approach 2 doesn't require adding anything to the dictionary, but it
> does mean that any procedures that need to use interactions would have
> to create those interactions themselves. These interactions would
> therefore be lost after the procedure exits, meaning that any other
> procedure that needs interactions would have to recreate
> them. Approach 2 also means writing more code that partly duplicates
> the code already in category.[ch].
>
> I favor approach number 1, but before I fiddle with the
> dictionary, I thought I should ask.
>
> -Jason
>
>
> _______________________________________________
> pspp-dev mailing list
> address@hidden
> http://lists.gnu.org/mailman/listinfo/pspp-dev
>
> --
> PGP Public key ID: 1024D/2DE827B3
> fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
> See http://pgp.mit.edu or any PGP keyserver for public key.
>
>