pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFC: output as tables


From: Jason Stover
Subject: Re: RFC: output as tables
Date: Tue, 3 Jun 2008 15:43:26 -0400
User-agent: Mutt/1.5.10i

On Mon, Jun 02, 2008 at 10:43:09PM -0700, Ben Pfaff wrote:
> I expect there will be plenty of comments, objections, etc. at
> this point.  That's fine.  Let's try to figure out whether this
> is a good idea that can be made into a good design.  If it isn't,
> then it's best to head it off at this point.

I mostly think this is a good idea, but just so I don't say "OK" and
regret it later:

I have questions about the conceptual connection between rows and
columns. Then I have some random questions. 

> 
> Example 1: DESCRIPTIVES
> -----------------------
> 
> Consider the DESCRIPTIVES procedure.  Its primary output could be a
> table with the following columns:
> 
>       - One column per SPLIT FILE variable (only if SPLIT FILE is in
>           effect).  These columns would be annotated as SPLIT FILE
>           variables.  (Oddly, SPLIT FILE doesn't give us a way to know
>           the sort order, unless we use a technique like that at
>           http://groups.google.com/group/comp.programming/msg/c7ebefe24af2f930
>           to figure it out on our own.)
> 
>       - VARIABLE, the name of a variable whose descriptive
>           statistics were calculated by the procedure.  This column
>           would be annotated with the variable's variable label.
> 
>       - N_VALID, the number of valid observations for this variable.
> 
>       - N_MISSING, the number of missing observations for this
>           variable.
> 
>       - N_MISSING_LISTWISE, the number of cases that had at least
>           one missing value in any of the analyzed variables.
> 
>       - One column per descriptive statistic calculated.
> 
> Each row in the table gives the descriptive statistics for one file
> split.

In this case, the relationship between rows and columns is easy to see.
Certainly there would be other times when this relationship would not
be so easy. Can you think of any cases in which there would 
be so many row/column relationships as to make this approach too bloated
or complex?

Also, this is the only example in which you defined the row/column
relationship: A row corresponding to a file split, a column to a
statistic. What other kinds of tables can you (or anyone) think of, where
the rows denote something besides a split file?

There is the obvious one in which someone uses a SAVE-like command to
save, say, residuals, where one row corresponds to one case and a column
to the new variable RESID.

And which part of the program would have to know what the rows "mean"?
How would it know? Do we have to explicitly code rules like "rows from
this table from this procedure means A; from that table, B; from that
procedure the rows mean C; ..."?

How will the table look if the statistic consists of more than one
value? For example, a confidence interval can be thought of as a single
statistic, but it is two values. Do you plan to keep the two values
in one "column" or two? (I guess John's suggestion about a "multicolumn"
feature would fix this.)

Can multiple procedures read from/write to the same table? 

Let's say someone fits many models and wants to compare the
performance of each. Could the output be made to have each procedure
write to a table, with one row corresponding to that procedure, and
one column to a statistic?  I think the answer to this is "yes"
because you mentioned it being done with something similar to
casereaders. I ask because, if this were all possible, then taking
that table and displaying it as something human-readable would be easy
and useful. 

Are these tables how procedures read each others' output?

> Not all output is reasonably treated as part of a table, so procedures
> may also write out additional metadata output not associated with a
> table.  Some kinds of metadata might be included in some kind of
> standardized form along with every procedure (or every command, even
> those that are not procedures), such as the following:
...

I know that if anything seems to not be easily describable by a table,
we can put it into metadata. But I'm afraid that if there are many
such exceptions to the table format, we often would wind up saying "oh
that can't be in a table, so we'll just put it in the metadata," and
thereby make the metadata a large dumping ground without any
coherence. Do you think this is a serious risk?

-Jason




reply via email to

[Prev in Thread] Current Thread [Next in Thread]