pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: linked lists and trees


From: Jason Stover
Subject: Re: linked lists and trees
Date: Fri, 26 Sep 2008 10:51:31 -0400
User-agent: Mutt/1.5.18 (2008-05-17)

On Wed, Sep 24, 2008 at 12:37:31PM -0700, Ben Pfaff wrote:
> Jason Stover <address@hidden> writes:
> 
> >> > One node in the tree would look like this:
> >> >
> >> > struct node
> >> > {
> >> >   struct variable *v1;
> >> >   struct variable *v2;
> >> >   double covariance_element;
> >> >   double center;
> >> >   union value *val1;
> >> >   union value *val2;
> >> >   size_t count;
> >> >   struct node *right;
> >> >   struct node *left;
> >> > };
> >> 
> >> Which members are the key that you want to look up to find a
> >> node?
> >
> > The key is a quadruple: (v1, v2, val1, val2).
> >
> > I quickly wrote something to do what I want, but I would prefer to use
> > something we already have.
> >
> > So does the hashing in src/libpspp sound like what I need?
> 
> I think that the hash table there will work fine.

So now I'm thinking hashing won't work. The problem is that I do
not know in advance the range of the hash function, nor how to
avoid collisions. 

If the keys were just, say, of the form (variable1, variable2),
I could make a simple hash function like

  key = variable1->idx + variable2->idx * n_vars

This would be fine for numeric variables.

The problem is with the categorical variables. I need distinct
entries in the hash table for each of the values of the categorical
variables, and I haven't passed the data yet, so I don't know how many
values there are, nor what they may be. Which means, I think, that I
can't write a hash function in advance that will be guaranteed to be
one-to-one.

So now I'm thinking about other data structures. In a previous message,
you said the key question is what operations I need. Here they are:

1. Searching
2. Inserting a new node (could be anywhere: at the "end", in the "middle")

I do NOT need sorting or deleting.
  
After this structure has been built, I will pass through it once and put
its values in a covariance matrix, then free the structure. That's all
I need to do.

So I was thinking the dumbest data structure, with a quick search, would 
work best. What's the most suitable that libpspp has already?




reply via email to

[Prev in Thread] Current Thread [Next in Thread]